239,428 Hits in 4.2 sec

Short Text Similarity with Word Embeddings

Tom Kenter, Maarten de Rijke
2015 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM '15  
doi:10.1145/2806416.2806475 fatcat:etilsmje7feo5pxftnitwhz73a

A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings

Karlo Babić, Francesco Guerra, Sanda Martinčić-Ipšić, Ana Meštrović
2020 Journal of Information and Organizational Sciences  
Since these models provide word vectors, we experiment with various methods that calculate the semantic similarity of short texts based on word vectors.  ...  More precisely, for each of these models, we test five methods for aggregating word embeddings into text embedding.  ...  similarity measure to calculate the semantic similarity of short texts based on word embeddings.  ... 
doi:10.31341/jios.44.2.2 fatcat:7hmceah4q5bqjl75mumzgyls4e

Short Text Classification Based on Distributional Representations of Words

Chenglong MA, Qingwei ZHAO, Jielin PAN, Yonghong YAN
2016 IEICE transactions on information and systems  
In this paper, we show how to mitigate the problem in short text classification through word embeddings.  ...  Experimental results validate the effectiveness of the proposed method. key words: short text classification, word embedding, gaussian model  ...  In embedding spaces, words with similar meanings tend to have similar word embeddings. However, the vocabulary size of word embeddings is usually large.  ... 
doi:10.1587/transinf.2016sll0006 fatcat:6bjcxyetsnaf5eiizrkhgyfbfq

Distributional Representations of Words for Short Text Classification

Chenglong Ma, Weiqun Xu, Peijia Li, Yonghong Yan
2015 Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing  
The word embeddings are trained from the entire English Wikipedia text. We assume that a short text document is a specific sample of one distribution in a Bayesian framework.  ...  In this paper we show how to mitigate the problems in short text classification (STC) through word embeddings -distributional representations of words learned from large unlabeled data.  ...  Similar to language modeling, we assume that a word embedding w i j for the i-th word in short text d j depends only on the preceding words.  ... 
doi:10.3115/v1/w15-1505 dblp:conf/naacl/MaXL015 fatcat:d5cdzr4iqngj5lraxwklh4gxni

Short Text Embedding Autoencoders with Attention-based Neighborhood Preservation

Chao Wei, Lijun Zhu, Jiaoxiang Shi
2020 IEEE Access  
For short text embedding, we selected only the samples with less than 21 words, denoted as 20Nshort, as done in [43] .  ...  Word embeddings is a general term for word vectorized representation technology derived from the distributional hypothesis, allowing words with similar meaning to have a similar representation.  ... 
doi:10.1109/access.2020.3042778 fatcat:k4rn6rwurzcpheet74zudnfwfu

Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding [article]

Ying Shen, Qiang Zhang, Jin Zhang, Jiyue Huang, Yuming Lu, Kai Lei
2018 arXiv   pre-print
To tackle this challenge, we propose to add word-cluster embedding to deep neural network for improving short text classification.  ...  Finally, we expand word vector with cluster center vector, and implement classifiers using CNN and LSTM respectively.  ...  The cluster embedding represent the implicit topic of all words in a cluster. Short Text Classifiers We use CNN and LSTM models to classify the short texts.  ... 
arXiv:1812.01885v1 fatcat:u3zszhd7wnfavk23e4zvk2lqna

Automatic Short Answer Scoring based on Paragraph Embeddings

Sarah Hassan, Aly A., Mohammad El-Ramly
2018 International Journal of Advanced Computer Science and Applications  
This paper presents a supervised learning approach for short answer automatic scoring based on paragraph embeddings.  ...  We review significant deep learning based models for generating paragraph embeddings and present a detailed empirical study of how the choice of paragraph embedding model influences accuracy in the task  ...  The semantic similarities between words can then be measured with simple methods such as cosine similarity. Word embeddings are often used as the input layer for deep learning models. [11] .  ... 
doi:10.14569/ijacsa.2018.091048 fatcat:3ntir3ii7fcaneb6kzqqjgrspi

A Nested Chinese Restaurant Topic Model for Short Texts with Document Embeddings

Yue Niu, Hongjie Zhang, Jing Li
2021 Applied Sciences  
However, document embeddings of short texts contain a lot of noisy information resulting from the sparsity of word co-occurrence information.  ...  Aggregating short texts into long documents according to document embeddings can provide sufficient word co-occurrence information and avoid incorporating non-semantic word co-occurrence information.  ...  Data Availability Statement: Datasets can be obtained from short-texts.git accessed on 5 September 2021.  ... 
doi:10.3390/app11188708 fatcat:pty7n2oftvgdvkgbvzzl523ttm

Learning Semantic Similarity for Very Short Texts

Cedric De Boom, Steven Van Canneyt, Steven Bohez, Thomas Demeester, Bart Dhoedt
2015 2015 IEEE International Conference on Data Mining Workshop (ICDMW)  
Traditional text similarity methods such as tf-idf cosine-similarity, based on word overlap, mostly fail to produce good results in this case, since word overlap is little or non-existent.  ...  for semantic content within very short text fragments.  ...  It relies on word overlap to find similarities, but in very short texts, in which word overlap is rare, tf-idf often fails.  ... 
doi:10.1109/icdmw.2015.86 dblp:conf/icdm/BoomCBDD15 fatcat:zh3ylvcpcjbo3bd3wi37p2xaxm

A Hybrid Classification Method via Character Embedding in Chinese Short Text with Few Words

Yi Zhu, Yun Li, Yongzheng Yue, Jipeng Qiang, Yunhao Yuan
2020 IEEE Access  
INDEX TERMS Short text with few words, character embedding, attention mechanism, feature selection.  ...  More specifically, firstly, the character embedding is computed to represent Chinese short texts with few words, which takes full advantage of short text information without external corpus.  ...  MOTIVATION The Chinese short text with few words refers to Chinese text with much more short length compared to the traditional normal text or even short text.  ... 
doi:10.1109/access.2020.2994450 fatcat:vfcj2ocm55fidp46vsxs4p6ucu

Topic Modeling for Short Texts via Word Embedding and Document Correlation

Feng Yi, Bo Jiang, Jianjun Wu
2020 IEEE Access  
INDEX TERMS Topic model, short texts, word embedding, document correlation, non-negative matrix factorization, regularization.  ...  TRNMF integrates successfully both word co-occurrence regularization and sentence similarity regularization into topic modeling for short texts.  ...  [38] integrate both word embeddings as supplementary information and an attention mechanism that segments short text documents into fragments of adjacent words receiving similar attention for short  ... 
doi:10.1109/access.2020.2973207 fatcat:qrmkhfoxqjb4bcutcyicuwvpiy

Collaboratively Modeling and Embedding of Latent Topics for Short Texts

Zheng Liu, Tingting Qin, Ke-jia Chen, Yun Li
2020 IEEE Access  
., topic models for word embeddings or vice versa, we propose CME-DMM, a collaboratively modeling and embedding framework for capturing coherent latent topics from short texts.  ...  It is challenging to handle the sparsity and the noise problems confronting short texts.  ...  One way to apply traditional topic models on short texts is to merge similar short texts into long virtual documents [4] , [5] , where similar short texts could be the short texts of the same author  ... 
doi:10.1109/access.2020.2997973 fatcat:xu6n7dgnrbcsjkxo5v2lm7vs6e

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Dongling Xiao, Han Zhang, Yukun Li, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
by word.  ...  To address this issue, we propose an enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an  ...  ., 2018b] proposed relational BTM (R-BTM) to link short texts via a similarity matrix of words computed by word embeddings.  ... 
doi:10.24963/ijcai.2020/549 dblp:conf/ijcai/YangWZWJGS20 fatcat:lvyjovkjkngizdy2oiwtpkrjh4

A Short Texts Matching Method Using Shallow Features and Deep Features [chapter]

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, Yan He
2014 Communications in Computer and Information Science  
Furthermore, we design a method to combine shallow features of short texts (i.e., LSI, VSM and some other handcraft features) with deep features of short texts (i.e., word embedding matching of short text  ...  In this paper, we focus on the semantic matching between short texts and design a model to generate deep features, which describe the semantic relevance between short "text object".  ...  However, this architecture almost use the traditional features instead of deep features of short text (i.e., word embedding).  ... 
doi:10.1007/978-3-662-45924-9_14 fatcat:c3innbeganauzhaswiw4jjesu4

Hierarchical Heterogeneous Graph Representation Learning for Short Text Classification [article]

Yaqing Wang and Song Wang and Quanming Yao and Dejing Dou
2021 arXiv   pre-print
Thus, compared with existing GNN-based methods, SHINE can better exploit interactions between nodes of the same types and capture similarities between short texts.  ...  Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.  ...  Upon this G s , we propagate label information among similar short documents via a 2-layer GCN. Let X s collectively record short text embeddings with x i s on the ith row.  ... 
arXiv:2111.00180v1 fatcat:r424geuhvrebfibnq2xtudoh54
« Previous Showing results 1 — 15 out of 239,428 results