20,450 Hits in 4.6 sec

An empirical study of sentiment analysis for chinese documents

2008 Expert systems with applications  
In order to remedy this deficiency, this paper presents an empirical study of sentiment categorization on Chinese documents.  ...  Up to now, there are very few researches conducted on sentiment classification for Chinese documents.  ...  However, he found that machine learning methods could not perform as well on sentiment classification as on traditional topic-based categorization.  ... 
doi:10.1016/j.eswa.2007.05.028 fatcat:qob3syyjerdcjouqkk6wy3zdd4

Text Similarity Computing Based on Standard Deviation [chapter]

Tao Liu, Jun Guo
2005 Lecture Notes in Computer Science  
Experiments on Chinese text documents show the validity and the feasibility of the standard deviation-based algorithm.  ...  Automatic text categorization is defined as the task to assign free text documents to one or more predefined categories based on their content.  ...  Once the categorization scheme is learned, it can be used for classifying future documents. It involves issues commonly found in machine learning problems.  ... 
doi:10.1007/11538059_48 fatcat:iqr5i33aarcara5bacamsf5qxq

Construction of supervised and unsupervised learning systems for multilingual text categorization

Chung-Hong Lee, Hsin-Chang Yang
2009 Expert systems with applications  
The preliminary results show that our platform models including both supervised and unsupervised learning methods have the potentials for multilingual text categorization.  ...  methods for system implementation.  ...  In the literature, the document categorization using machine learning techniques normally consider the supervised methods to carry out the tasks.  ... 
doi:10.1016/j.eswa.2007.12.052 fatcat:rwqnbwz2lra33o4nwvpge6peqi

Cross Language Text Categorization Using a Bilingual Lexicon

Ke Wu, Xiaolin Wang, Bao-Liang Lu
2008 International Joint Conference on Natural Language Processing  
The preliminary experiments on collected document collections show the effectiveness of the proposed method and verify the feasibility of achieving performance close to monolingual text categorization,  ...  Cross language text categorization has attracted more and more attention for the organization of these heterogeneous document collections.  ...  The authors would like to thank three anonymous reviewers for their valuable suggestions.  ... 
dblp:conf/ijcnlp/WuWL08 fatcat:zb34hrh2cbflznxdaoce7jfsca

Machine learning for Asian language text classification

Fuchun Peng, Xiangji Huang
2007 Journal of Documentation  
Purpose -The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information  ...  The paper advocates a simple language modeling based approach for this task.  ...  Machine learning for Asian language 6.6 Related work In principle, any language model can be used to perform text categorization based on equation (19) .  ... 
doi:10.1108/00220410710743306 fatcat:fuwt4d3ppjbrhkcxyeexy3tcv4

Categorizing Patient Disease into ICD-10 with Deep Learning for Semantic Text Classification [chapter]

Junmei Zhong, Xiu Yi
2020 Recent Trends in Computational Intelligence [Working Title]  
(SVM), one of the most popular conventional machine learning algorithms, demonstrating the great impact of deep learning on medical big data analysis.  ...  In this paper, we develop natural language processing (NLP), deep learning, and machine learning algorithms to automatically categorize each patient's individual diseases into the ICD-10 standard.  ...  The tokenization of Chinese documents When we use machine learning for document categorization, documents first need to be tokenized into individual words or tokens.  ... 
doi:10.5772/intechopen.91292 fatcat:dr5fvs7oo5hz5nnzfjs4bcv4pe

Classification of Chinese-to-English translated social network timelines using naive Bayes

Xiang-Ru Yu, Zhong-Liang Xiang, Dae-Ki Kang
2015 2015 17th International Conference on Advanced Communication Technology (ICACT)  
In the previous research, Chinese sentences are processed using Chinese word segmentation algorithms before the application of machine learning algorithm.  ...  Therefore, the quality of word segmentation algorithm obviously influences the accuracy of Chinese text categorization problems.  ...  Facing these huge amount of comments in Weibo, it is getting more difficult to categorize comments manually. Thus, we use machine learning method to deal with this text categorization problem.  ... 
doi:10.1109/icact.2015.7224807 fatcat:amv5jfxap5a5tims22n22pwvmu


Ji He
2012 Applied intelligence (Boston)  
This paper reports our comparative evaluation of three machine learning methods, namely k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) for Chinese  ...  document categorization.  ...  Zhou, for providing the Chinese segmentation software and F.-L. Lai for valuable suggestions in designing the experiments. We thank T. Joachims for making SVM light available.  ... 
doi:10.1023/a:1023202221875 fatcat:5u36mfu3bzg6pohqnstifolfiq

A Survey of Multilingual Document Clustering

Kavita Moholkar
2017 International Journal Of Engineering And Computer Science  
Two major approaches used till date are machine translation of documents for classification and use bilingual dictionaries for effective translation of trained classification models.  ...  The major focus is on the problem of translating documents and classifying it semantically.  ...  They used the Rocchio algorithm, a popular learning method based on relevance feedback, and the Winnow algorithm, a method for learning a linear classifier from labeled examples, to categorize documents  ... 
doi:10.18535/ijecs/v6i4.21 fatcat:tcls565sqnfxxj3bx722smrviq

Multiple-instance learning for text categorization based on semantic representation

Jian-Bing Zhang, Yi-Xin Sun, De-Chuan Zhan
2017 Big Data & Information Analytics  
We represent the document as multiple instances based on word2vec.  ...  In this paper, we propose a new method to handle the semantic correlations between different words and text features from the representations and the learning schemes.  ...  In this paper, we propose a new method to represent a document as several vectors based on the word2vec, and use Multiple-Instance learning model for the final categorization operation [1] .  ... 
doi:10.3934/bdia.2017009 fatcat:4pygm75uhzawbeontowgs36qwq

Cross-lingual text classification with model translation and document translation

Teng-Sheng Moh, Zhang Zhang
2012 Proceedings of the 50th Annual Southeast Regional Conference on - ACM-SE '12  
The most direct solution would be to translate those documents in other languages into one language by the machine translator.  ...  This method can take advantage of the very best functionality between both the document translation and model translation methods.  ...  For the experiment, the author used two sets of labeled data, one is in Chinese and the other one is in English.  ... 
doi:10.1145/2184512.2184530 dblp:conf/ACMse/MohZ12 fatcat:fx7qigosrzg4rdadtzre4p4azq

Text Categorization System for Stock Prediction

Bozhao Li, Na Chen, Jing Wen, Xuebo Jin, Yan Shi
2015 International Journal of u- and e- Service, Science and Technology  
After 90 of the twentieth century, many statistics and machine learning methods are used in text categorization which has aroused great interest of researchers.  ...  At present, researchers have also started the research and get preliminary application of Chinese text categorization like information retrieval, digital library, automatic summarization and categorization  ...  K Nearest Neighbors (KNN) Among several categorizations, KNN is one of the most classical method which is a non-parametric method used for classification.  ... 
doi:10.14257/ijunesst.2015.8.2.04 fatcat:wkljulblh5arbbxufcwjf5tuo4

Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study [chapter]

Abdelwadood. Moh'd. Mesleh
2008 Advances in Computer and Information Sciences and Engineering  
Vector Machine Classifier.  ...  This paper investigates the effectiveness of six commonly used feature selection methods, Evaluation used an in-house collected Arabic text classification corpus, and classification is based on Support  ...  TAN, On Machine Learning Methods for Chinese document Categorization, Applied Intelligence, 2003, pp. 311-322. [15]A.M. Samir, W. Ata, and N.  ... 
doi:10.1007/978-1-4020-8741-7_3 dblp:conf/cisse/Mesleh07 fatcat:o7zi45wsnvc6ddxxj3qlr3s5b4

Arabic Text Categorization Algorithm Using Vector Evaluation Method

Ashraf Odeh, Aymen Abu-Errub, Qusai Shambour, Nidal Turab
2014 International Journal of Computer Science & Information Technology (IJCSIT)  
This paper proposes a new method for Arabic text categorization using vector evaluation.  ...  Text categorization is the process of grouping documents into categories based on their contents.  ...  approaches in text categorization , and focused on using the machine learning in TC research .  ... 
doi:10.5121/ijcsit.2014.6606 fatcat:2fcnfbx4irdojmer5qegkhccn4

A Novel Kernel for Text Classification Based on Semantic and Statistical Information

Haipeng Yao, Bo Zhang, Peiying Zhang, Maozhen Li
2018 Computing and informatics  
In text categorization, a document is usually represented by a vector space model which can accomplish the classification task, but the model cannot deal with Chinese synonyms and polysemy phenomenon.  ...  The proposed approach computes semantic information based on HowNet and statistical information based on a kernel function with class-based weighting.  ...  RELATED WORK Support Vector Machines for Classification Support vector machine (SVM) is a very effective machine learning algorithm developed from statistical learning theory.  ... 
doi:10.4149/cai_2018_4_992 fatcat:hhjfu54cxjcvzbwlyk3yr7glzi
« Previous Showing results 1 — 15 out of 20,450 results