Filters








232,867 Hits in 5.2 sec

A General Bio-inspired Method to Improve the Short-Text Clustering Task [chapter]

Diego Ingaramo, Marcelo Errecalde, Paolo Rosso
2010 Lecture Notes in Computer Science  
Short-text clustering" is a very important research field due to the current tendency for people to use very short documents, e.g. blogs, text-messaging and others.  ...  The proposal shows an interesting improvement in the results obtained with different algorithms on several short-text collections.  ...  We thank the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 research project for funding the work of the second and third authors.  ... 
doi:10.1007/978-3-642-12116-6_56 fatcat:2ijpmh3ay5dmzija6bznl6tbhm

Clustering Abstracts Instead of Full Texts [chapter]

Pavel Makagonov, Mikhail Alexandrov, Alexander Gelbukh
2004 Lecture Notes in Computer Science  
Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results.  ...  Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers.  ...  The former circumstance is due to extremely small size of documents, which leads to very small absolute frequencies of keywords.  ... 
doi:10.1007/978-3-540-30120-2_17 fatcat:dhpdcbkzqfgvfa36r3ejwo7m5e

A Self-enriching Methodology for Clustering Narrow Domain Short Texts

D. Pinto, P. Rosso, H. Jimenez-Salazar
2010 Computer journal  
Clustering narrow domain short texts is considered to be a complex task because of the intrinsic features of the corpus to be clustered: (i) the low frequencies of vocabulary terms in short texts, and  ...  We also propose a set of supervised and unsupervised text assessment measures for evaluating different corpus features, such as shortness, stylometry and domain broadness.  ...  among the short text collections.  ... 
doi:10.1093/comjnl/bxq069 fatcat:46hcjyggxbdqtjc5wyxo3ari2u

Method of Feature Reduction in Short Text Classification Based on Feature Clustering

Li, Yin, Shi, Mao, Shi
2019 Applied Sciences  
We classify short texts with corresponding similar feature clusters instead of original feature words.  ...  One decisive problem of short text classification is the serious dimensional disaster when utilizing a statistics-based approach to construct vector spaces.  ...  The optimization of feature representation is another effective method for semantic expansion in short texts, which optimizes within a text collection. Jiang et al.  ... 
doi:10.3390/app9081578 fatcat:o7qnx7svozb6vdh3zeq326ikya

Defining and Evaluating Blog Characteristics

Fernando Perez Tellez, David Pinto, John Cardiff, Paolo Rosso
2009 2009 Eighth Mexican International Conference on Artificial Intelligence  
Due to their specific characteristics, such as shortness, vocabulary size and nature, etc. it can be difficult to achieve good results using automated clustering techniques.  ...  We furthermore present the results of some experiments in which we analyzed the features of two sample blog corpora, and we compared the results with other kinds of short texts.  ...  The work of the second and the fourth author is supported by the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 research project.  ... 
doi:10.1109/micai.2009.21 fatcat:6yp6aeerj5bk7pabulqfc6arcm

Improving Document Clustering for Short Texts by Long Documents via a Dirichlet Multinomial Allocation Model [chapter]

Yingying Yan, Ruizhang Huang, Can Ma, Liyang Xu, Zhiyuan Ding, Rui Wang, Ting Huang, Bowei Liu
2017 Lecture Notes in Computer Science  
In this paper, we propose a novel model, namely DDMAfs, which 1) improves the clustering performance of short texts by sharing structural knowledge of long documents to short texts; 2) automatically identifies  ...  Comparisons between the DDMAfs model and state-of-the-art short text clustering approaches show that the DDMAfs model is effective.  ...  The NMI result for short text data points when ω gets different values. Table 1 . 1 Summary Description of Datasets. Long Text Sets, S: Short Text Sets, V: Vocabulary size, K: Number of clusters.)  ... 
doi:10.1007/978-3-319-63579-8_47 fatcat:eqkjrwpez5cfxdchzo4skwdltq

Ranking Based Clustering for Social Event Detection

Taufik Sutanto, Richi Nayak
2014 MediaEval Benchmarking Initiative for Multimedia Evaluation  
The problem of clustering a large document collection is not only challenged by the number of documents and the number of dimensions, but it is also affected by the number and sizes of the clusters.  ...  Text, temporal, spatial and visual content information collected from the social event images is utilized in calculating similarity.  ...  This type of approaches works fine when the collection size or the number of clusters required is small.  ... 
dblp:conf/mediaeval/SutantoN14 fatcat:gf4p27xqr5f4bjnjdp5fv7mcta

An Approach to Clustering Abstracts [chapter]

Mikhail Alexandrov, Alexander Gelbukh, Paolo Rosso
2005 Lecture Notes in Computer Science  
Current keyword-based techniques allow for clustering such type of short texts only when the data set is multi-category, e.g., some documents are devoted to sport, others to medicine, others to politics  ...  libraries should provide document images of full texts of the papers (and not only abstracts) for open access via Internet, in order to help in search, classification, clustering, selection, and proper  ...  We slightly modified it in order to avoid circling related with weak connections between the documents due to their small size.  ... 
doi:10.1007/11428817_25 fatcat:45xu3tbl7bazdnsvs5tbxza5mm

A Hotspot Discovery Method Based on Improved FIHC Clustering Algorithm

2021 Tehnički Vjesnik  
Then the initial cluster of the text repletion of mircoblog was reduced, and the idea of Single-Pass clustering was used to the reduced topic cluster in order to get the Hotspot.  ...  It was difficult to find the microblog hotspot because the characteristics of microblog were short, rapid, change and so on.  ...  ), and the program of Fujian Provincial Department of Education (JAT201035).  ... 
doi:10.17559/tv-20210610120531 fatcat:aztw4tpuejbpde5ypxv35jrfoy

Proximity Estimation and Hardness of Short-Text Corpora

Marcelo Luis Errecalde, Diego Ingaramo, Paolo Rosso
2008 2008 19th International Conference on Database and Expert Systems Applications  
validity measures on the "ideal" clustering of each corpus.  ...  In this work, we investigate the relative hardness of shorttext corpora in clustering problems and how this hardness relates to traditional similarity measures.  ...  It could be argued that our preliminary analysis is limited to small size collections.  ... 
doi:10.1109/dexa.2008.87 dblp:conf/dexaw/ErrecaldeIR08 fatcat:lzjq7ht7jbewnopgdexnlvs27i

Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix [chapter]

Xiaohui Yan, Jiafeng Guo, Shenghua Liu, Xueqi Cheng, Yanfeng Wang
2013 Proceedings of the 2013 SIAM International Conference on Data Mining  
Such term correlation data is less sparse and more stable with the increase of the collection size, and can well capture the necessary information for topic learning.  ...  Nowadays, short texts are very prevalent in various web applications, such as microblogs, instant messages. The severe sparsity of short texts hinders existing topic models to learn reliable topics.  ...  Meanwhile, we observe that when the size of the short text corpus becomes larger and larger, the size of distinct terms usually keeps relative small and stable.  ... 
doi:10.1137/1.9781611972832.83 dblp:conf/sdm/ChengGLWY13 fatcat:p72w36mwkfaovoji75gtv4icpe

Does Size Matter? When Small is Good Enough

Anna Lisa Gentile, Amparo Elizabeth Cano Basave, Aba-Sah Dadzie, Vitaveska Lanfranchi, Neil Ireson
2011 Workshop on Making Sense of Microposts  
Our hypothesis is that based on a specific task (in this case, topic classification), results obtained using longer texts may be approximated by short texts, of micropost size, i.e., maximum length 140  ...  This paper reports the observation of the influence of the size of documents on the accuracy of a defined text processing task.  ...  Our hypothesis is that based on a specific task, results obtained using short texts of micropost size approximate results obtainable with longer texts.  ... 
dblp:conf/msm/GentileBDLI11 fatcat:rx3fnqhj6vdp7i7c3ay2t52ujy

Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance [chapter]

David Pinto, José-Miguel Benedí, Paolo Rosso
2007 Lecture Notes in Computer Science  
Clustering short length texts is a difficult task itself, but adding the narrow domain characteristic poses an additional challenge for current clustering methods.  ...  We addressed this problem with the use of a new measure of distance between documents which is based on the symmetric Kullback-Leibler distance.  ...  This collection was used by Makagonov et al. [15] in their experiments on clustering short texts of narrow domains.  ... 
doi:10.1007/978-3-540-70939-8_54 fatcat:bhx6lm2r2ffh5ganpskotl4b44

Evaluating Term Concept Association Mesaures for Short Text Expansion: Two Case Studies of Classification and Clustering

Alessandro Marco Boutari, Claudio Carpineto, Raffaele Nicolussi
2010 International Conference on Concept Lattices and their Applications  
By means of two case studies, we evaluate the effectiveness of these measures for expansion-enhanced K-NN classification and K-Means clustering of short texts.  ...  The proliferation of Web applications based on short texts represents both an opportunity and a challenge to text mining algorithms, because of sparse representations and lack of shared context.  ...  ODP-239 thus consists of many small collections, each with a comparatively large set of classes, as opposed to having one large collection of documents with a small number of classes.  ... 
dblp:conf/cla/BoutariCN10 fatcat:d42mo4y63natnpwr4ray3spdcm

An efficient Particle Swarm Optimization approach to cluster short texts

Leticia Cagnina, Marcelo Errecalde, Diego Ingaramo, Paolo Rosso
2014 Information Sciences  
small and medium size.  ...  An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences. 265:36-49.  ...  the framework of the Microcluster VLC/Campus (International Campus of Excellence) on Multimodal Intelligent Systems.  ... 
doi:10.1016/j.ins.2013.12.010 fatcat:ghpkmvuujjafbdhfu25mvmi4a4
« Previous Showing results 1 — 15 out of 232,867 results