A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2006; you can also visit the original URL.
The file type is application/pdf
.
Filters
Experiments in high-dimensional text categorization
2002
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02
We present results for automated text categorization of the Reuters-810000 collection of news stories. ...
We divide the data into monthly groups and provide an initial benchmark of text categorization performance on the complete collection. ...
In this paper, we make use of this data set to establish a new benchmark for evaluating text categorization performance in a high dimensional space. ...
doi:10.1145/564376.564442
dblp:conf/sigir/DamerauZWI02
fatcat:voguniipqzggdpmbtgsks6k3fe
Experiments in high-dimensional text categorization
2002
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02
We present results for automated text categorization of the Reuters-810000 collection of news stories. ...
We divide the data into monthly groups and provide an initial benchmark of text categorization performance on the complete collection. ...
In this paper, we make use of this data set to establish a new benchmark for evaluating text categorization performance in a high dimensional space. ...
doi:10.1145/564437.564442
fatcat:myi4ilmynzattlxm3lumivxkey
Arabic Text Classification Based on Features Reduction Using Artificial Neural Networks
2013
2013 UKSim 15th International Conference on Computer Modelling and Simulation
Text categorization is one solution to tackle this problem. ...
The system's primary source of knowledge is an Arabic text categorization (TC) corpus built locally at the University of Jordan and available at http://nlp.ju.edu.jo. ...
MSE VS epochs for Arabic documents categorization
Experiment I In the first experiment, we have experimented with the common used feature selection method TF_IDF, in order to reduce the high dimensionality ...
doi:10.1109/uksim.2013.135
dblp:conf/uksim/ZaghoulA13
fatcat:3h67ywpzhbdtrgey3stbuywhuu
A Centroid Based Text Categorization Method Using Mean Shift
2013
Journal of Information and Computational Science
In this paper, we propose a method for text categorization based on Mean Shift. Mean Shift algorithm is a well developed technique in computer vision researches. ...
Text categorization is an important research topic in Information Retrieval area and it is one of the key techniques for handling and organizing the huge amount of text data available on the Internet and ...
Related Work
Dimension Reduction in Text Categorization The most critical challenge for text categorization is the high dimensionality of the natural language text, often referred to as the "curse of ...
doi:10.12733/jics20102921
fatcat:b4ohjpvvbbcxlibx5nijmq2gdu
A comparison and semi-quantitative analysis of words and character-bigrams as features in Chinese text categorization
2006
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL '06
in text categorization systems. ...
Words and character-bigrams are both used as features in Chinese text processing tasks, but no systematic comparison or analysis of their values as features for Chinese text categorization has been reported ...
Few similar comparative studies have been reported for Text Categorization (Li et al., 2003) so far in literature. ...
doi:10.3115/1220175.1220244
dblp:conf/acl/LiSZ06
fatcat:cewpik3qq5bktn6oii5vizafzm
The Lao Text Classification Method Based on KNN
2020
Procedia Computer Science
Text categorization is a common application scenario in the NLP field, and has many applications in public opinion monitoring and news classification. ...
Text categorization is a common application scenario in the NLP field, and has many applications in public opinion monitoring and news classification. ...
Because the text is stored as a vector space, the dimension is high. ...
doi:10.1016/j.procs.2020.02.053
fatcat:zkixkrxc75cxnnj3zfbk4y6s3q
Enhancement of DTP Feature Selection Method for Text Categorization
[chapter]
2005
Lecture Notes in Computer Science
This paper studies the structure of vectors obtained by using term selection methods in high-dimensional text collection. ...
Typically even a moderately sized collection of text has tens or hundreds of thousands of terms. Hence, the document vectors are high-dimensional. ...
However, the vectors produced by DTP have a "sparse" behavior that is not commonly found in low-dimensional text collections. ...
doi:10.1007/978-3-540-30586-6_80
fatcat:h33dbbaj5zboperm2yrmffnkt4
Evaluating text categorization in the presence of OCR errors
2000
Document Recognition and Retrieval VIII
In this paper we describe experiments that investigate the effects of OCR errors on text categorization. ...
We also observe that dimensionality reduction techniques eliminate a large number of OCR errors and improve categorization results. ...
Our experiments show that OCR errors have little effect on text categorization once some form of dimensionality reduction has been applied. ...
doi:10.1117/12.410861
dblp:conf/drr/TaghvaNBLCY01
fatcat:cliqa7xqd5hrzkzdvnwnkc53o4
Random Subspace Method in Text Categorization
2010
2010 20th International Conference on Pattern Recognition
Due to the huge number of terms in even a moderate-size text corpus, high dimensional feature space is an intrinsic problem in TC. ...
In text categorization (TC), which is a supervised technique, a feature vector of terms or phrases is usually used to represent the documents. ...
However, it is not extensively investigated on strong classifiers such as SVM (that perform rather well in high dimensional feature space) nor in the area of text categorization. ...
doi:10.1109/icpr.2010.505
dblp:conf/icpr/GangehKD10
fatcat:u5nyw3mws5faxgv7sx4pz2yyhe
A Cluster Tree Method For Text Categorization
2011
Procedia Engineering
Experiments show that the cluster tree solves the high-dimensionality problem and outperforms C4.5 and CART on text data. ...
Since more features are ignored, the classification accuracy is not high. To solve the problem, this paper uses a cluster tree for text categorization. Unlike familiar decision trees (e.g. ...
However, previous works have found that the classification accuracy of decision tree is not high on text categorization. The difficulty of dealing with the text data is the high dimensionality [6] . ...
doi:10.1016/j.proeng.2011.08.709
fatcat:fq4hkohe3rglvjtsr2djbczaxy
Text categorization with Support Vector Machines: Learning with many relevant features
[chapter]
1998
Lecture Notes in Computer Science
This paper explores the use of Support Vector Machines SVMs for learning text classi ers from examples. ...
It analyzes the particular properties of learning with text data and identi es why SVMs are appropriate for this task. Empirical results support the theoretical ndings. ...
With their ability to generalize well in high dimensional feature spaces, SVMs eliminate the need for feature selection, making the application of text categorization considerably easier. ...
doi:10.1007/bfb0026683
fatcat:e6wov4nsd5fbjkdl4oyllkgssi
Enhancing Text Categorization with Semantic-enriched Representation and Training Data Augmentation
2006
JAMIA Journal of the American Medical Informatics Association
Design: We studied two approaches that enhance the text categorization performance on sparse and high data dimensionality: (1) semantic-preserving dimension reduction by representing text with semantic-enriched ...
In the real world, many information retrieval tasks are difficult because of high data dimensionality and the lack of annotated examples to train a retrieval algorithm. ...
Conclusion In summary, we have studied two approaches for enhancing text categorization under the scenario of high dimensionality and scarce training data: (1) semantic-preserving dimension reduction with ...
doi:10.1197/jamia.m2051
pmid:16799127
pmcid:PMC1561790
fatcat:iix6pfcutnhvdg3q4sd53xihyy
Some Investigations on Machine Learning Techniques for Automated Text Categorization
2013
International Journal of Computer Applications
The automated categorization (classification) of texts into predefined categories is one of the widely explored fields of research in text mining. ...
Now-a-days, availability of digital data is very high, and to manage them in predefined categories has become a challenging task. ...
Step3: For TC high dimensionality of term space is not proper for many sophisticated algorithms (e.g. LLSF [8] ). Hence, before classification, dimensionality reduction (DR) is applied. ...
doi:10.5120/12340-8617
fatcat:jom2wztpfrdghc6vphb6dmiqm4
Improving arabic text categorization using decision trees
2009
2009 First International Conference on Networked Digital Technologies
To test the effectiveness of the proposed model, experiments were conducted using an in-house collected Arabic corpus for text categorization. ...
The results showed that the proposed model was able to achieve high categorization effectiveness as measured by precision, recall and F-measure. ...
Related work in Arabic text categorization Many researchers have been working on text categorization in English and other European languages, however few researchers work on text categorization for Arabic ...
doi:10.1109/ndt.2009.5272214
fatcat:yzbqjpqnhbehnizct33a7rfnuy
An empirical evaluation of text classification and feature selection methods
2016
Artificial intelligence research
Support Vector Machine with linear kernel reigned supreme for text categorization tasks producing highest F measures and low training times even in the presence of high class skew. ...
An extensive empirical evaluation of classifiers and feature selection methods for text categorization is presented. ...
Text categorization can also be seen as the problem of establishing decision boundaries in the high dimensional feature space. ...
doi:10.5430/air.v5n2p70
fatcat:utpb25jxhreiflpge5dsvjetmm
« Previous
Showing results 1 — 15 out of 95,993 results