A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is application/pdf
.
Filters
A General Bio-inspired Method to Improve the Short-Text Clustering Task
[chapter]
2010
Lecture Notes in Computer Science
Short-text clustering" is a very important research field due to the current tendency for people to use very short documents, e.g. blogs, text-messaging and others. ...
The proposal shows an interesting improvement in the results obtained with different algorithms on several short-text collections. ...
We thank the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 research project for funding the work of the second and third authors. ...
doi:10.1007/978-3-642-12116-6_56
fatcat:2ijpmh3ay5dmzija6bznl6tbhm
Clustering Abstracts Instead of Full Texts
[chapter]
2004
Lecture Notes in Computer Science
Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. ...
Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. ...
The former circumstance is due to extremely small size of documents, which leads to very small absolute frequencies of keywords. ...
doi:10.1007/978-3-540-30120-2_17
fatcat:dhpdcbkzqfgvfa36r3ejwo7m5e
A Self-enriching Methodology for Clustering Narrow Domain Short Texts
2010
Computer journal
Clustering narrow domain short texts is considered to be a complex task because of the intrinsic features of the corpus to be clustered: (i) the low frequencies of vocabulary terms in short texts, and ...
We also propose a set of supervised and unsupervised text assessment measures for evaluating different corpus features, such as shortness, stylometry and domain broadness. ...
among the short text collections. ...
doi:10.1093/comjnl/bxq069
fatcat:46hcjyggxbdqtjc5wyxo3ari2u
Method of Feature Reduction in Short Text Classification Based on Feature Clustering
2019
Applied Sciences
We classify short texts with corresponding similar feature clusters instead of original feature words. ...
One decisive problem of short text classification is the serious dimensional disaster when utilizing a statistics-based approach to construct vector spaces. ...
The optimization of feature representation is another effective method for semantic expansion in short texts, which optimizes within a text collection. Jiang et al. ...
doi:10.3390/app9081578
fatcat:o7qnx7svozb6vdh3zeq326ikya
Defining and Evaluating Blog Characteristics
2009
2009 Eighth Mexican International Conference on Artificial Intelligence
Due to their specific characteristics, such as shortness, vocabulary size and nature, etc. it can be difficult to achieve good results using automated clustering techniques. ...
We furthermore present the results of some experiments in which we analyzed the features of two sample blog corpora, and we compared the results with other kinds of short texts. ...
The work of the second and the fourth author is supported by the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 research project. ...
doi:10.1109/micai.2009.21
fatcat:6yp6aeerj5bk7pabulqfc6arcm
Improving Document Clustering for Short Texts by Long Documents via a Dirichlet Multinomial Allocation Model
[chapter]
2017
Lecture Notes in Computer Science
In this paper, we propose a novel model, namely DDMAfs, which 1) improves the clustering performance of short texts by sharing structural knowledge of long documents to short texts; 2) automatically identifies ...
Comparisons between the DDMAfs model and state-of-the-art short text clustering approaches show that the DDMAfs model is effective. ...
The NMI result for short text data points when ω gets different values.
Table 1 . 1 Summary Description of Datasets. Long Text Sets, S: Short Text Sets, V: Vocabulary size, K: Number of clusters.) ...
doi:10.1007/978-3-319-63579-8_47
fatcat:eqkjrwpez5cfxdchzo4skwdltq
Ranking Based Clustering for Social Event Detection
2014
MediaEval Benchmarking Initiative for Multimedia Evaluation
The problem of clustering a large document collection is not only challenged by the number of documents and the number of dimensions, but it is also affected by the number and sizes of the clusters. ...
Text, temporal, spatial and visual content information collected from the social event images is utilized in calculating similarity. ...
This type of approaches works fine when the collection size or the number of clusters required is small. ...
dblp:conf/mediaeval/SutantoN14
fatcat:gf4p27xqr5f4bjnjdp5fv7mcta
An Approach to Clustering Abstracts
[chapter]
2005
Lecture Notes in Computer Science
Current keyword-based techniques allow for clustering such type of short texts only when the data set is multi-category, e.g., some documents are devoted to sport, others to medicine, others to politics ...
libraries should provide document images of full texts of the papers (and not only abstracts) for open access via Internet, in order to help in search, classification, clustering, selection, and proper ...
We slightly modified it in order to avoid circling related with weak connections between the documents due to their small size. ...
doi:10.1007/11428817_25
fatcat:45xu3tbl7bazdnsvs5tbxza5mm
A Hotspot Discovery Method Based on Improved FIHC Clustering Algorithm
2021
Tehnički Vjesnik
Then the initial cluster of the text repletion of mircoblog was reduced, and the idea of Single-Pass clustering was used to the reduced topic cluster in order to get the Hotspot. ...
It was difficult to find the microblog hotspot because the characteristics of microblog were short, rapid, change and so on. ...
), and the program of Fujian Provincial Department of Education (JAT201035). ...
doi:10.17559/tv-20210610120531
fatcat:aztw4tpuejbpde5ypxv35jrfoy
Proximity Estimation and Hardness of Short-Text Corpora
2008
2008 19th International Conference on Database and Expert Systems Applications
validity measures on the "ideal" clustering of each corpus. ...
In this work, we investigate the relative hardness of shorttext corpora in clustering problems and how this hardness relates to traditional similarity measures. ...
It could be argued that our preliminary analysis is limited to small size collections. ...
doi:10.1109/dexa.2008.87
dblp:conf/dexaw/ErrecaldeIR08
fatcat:lzjq7ht7jbewnopgdexnlvs27i
Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix
[chapter]
2013
Proceedings of the 2013 SIAM International Conference on Data Mining
Such term correlation data is less sparse and more stable with the increase of the collection size, and can well capture the necessary information for topic learning. ...
Nowadays, short texts are very prevalent in various web applications, such as microblogs, instant messages. The severe sparsity of short texts hinders existing topic models to learn reliable topics. ...
Meanwhile, we observe that when the size of the short text corpus becomes larger and larger, the size of distinct terms usually keeps relative small and stable. ...
doi:10.1137/1.9781611972832.83
dblp:conf/sdm/ChengGLWY13
fatcat:p72w36mwkfaovoji75gtv4icpe
Does Size Matter? When Small is Good Enough
2011
Workshop on Making Sense of Microposts
Our hypothesis is that based on a specific task (in this case, topic classification), results obtained using longer texts may be approximated by short texts, of micropost size, i.e., maximum length 140 ...
This paper reports the observation of the influence of the size of documents on the accuracy of a defined text processing task. ...
Our hypothesis is that based on a specific task, results obtained using short texts of micropost size approximate results obtainable with longer texts. ...
dblp:conf/msm/GentileBDLI11
fatcat:rx3fnqhj6vdp7i7c3ay2t52ujy
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance
[chapter]
2007
Lecture Notes in Computer Science
Clustering short length texts is a difficult task itself, but adding the narrow domain characteristic poses an additional challenge for current clustering methods. ...
We addressed this problem with the use of a new measure of distance between documents which is based on the symmetric Kullback-Leibler distance. ...
This collection was used by Makagonov et al. [15] in their experiments on clustering short texts of narrow domains. ...
doi:10.1007/978-3-540-70939-8_54
fatcat:bhx6lm2r2ffh5ganpskotl4b44
Evaluating Term Concept Association Mesaures for Short Text Expansion: Two Case Studies of Classification and Clustering
2010
International Conference on Concept Lattices and their Applications
By means of two case studies, we evaluate the effectiveness of these measures for expansion-enhanced K-NN classification and K-Means clustering of short texts. ...
The proliferation of Web applications based on short texts represents both an opportunity and a challenge to text mining algorithms, because of sparse representations and lack of shared context. ...
ODP-239 thus consists of many small collections, each with a comparatively large set of classes, as opposed to having one large collection of documents with a small number of classes. ...
dblp:conf/cla/BoutariCN10
fatcat:d42mo4y63natnpwr4ray3spdcm
An efficient Particle Swarm Optimization approach to cluster short texts
2014
Information Sciences
small and medium size. ...
An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences. 265:36-49. ...
the framework of the Microcluster VLC/Campus (International Campus of Excellence) on Multimodal Intelligent Systems. ...
doi:10.1016/j.ins.2013.12.010
fatcat:ghpkmvuujjafbdhfu25mvmi4a4
« Previous
Showing results 1 — 15 out of 232,867 results