Filters








85,966 Hits in 8.1 sec

Little words can make a big difference for text classification

Ellen Riloff
1995 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '95  
However, we have found that recognizing singular and plural nouns, vexb forms, negation, and prepositions can produce dramatically different text classification results.  ...  We present results from text classification experiments that compare relevancy signatures, which use local linguistic context, with corresponding indexing terms that do not.  ...  However, for some classification tasks, classifying texts on the basis of a single linguistic expression can be effective.  ... 
doi:10.1145/215206.215349 dblp:conf/sigir/Riloff95 fatcat:hbtb5avhn5ej5g4qysxuu4wn7m

Privacy Preserving Unstructured Data Publishing (PPUDP) Approach for Big Data

Ramya Shree
2019 International Journal of Computer Applications  
As it comprises of different types of data like structured data, unstructured data and semi structured data, a variety of techniques available for preserving privacy for structured and *semi structured  ...  The Big Data analytics normally carried out by third party, the data provider can be able to classify data as secured or not secured prior to data publishing for Data Analytics and it involves classification  ...  Using set of predetermined words as features, word occurrence counts as feature values, a secure classifier is constructed for a set of documents.  ... 
doi:10.5120/ijca2019919091 fatcat:5zknzlofl5hafk2qdny36damyq

Word embedding and text classification based on deep learning methods

Saihan Li, Bing Gong, I. Barukčić
2021 MATEC Web of Conferences  
Automatic text classification can help people summary the text accurately and quickly from the mass of text information.  ...  Based on this background, we presented different word embedding methods such as word2vec, doc2vec, tfidf and embedding layer.  ...  In Chinese, one character can be a word, two or three characters can also make up a word, even 4 characters is also a word.  ... 
doi:10.1051/matecconf/202133606022 fatcat:ijie7dazajgunpsi7n72gliujm

The Past, Present and Future of Text Classification

Niklas Zechner
2013 2013 European Intelligence and Security Informatics Conference  
Despite over a century of research, the study of text classification is still chaotic.  ...  In this article, we give an overview of some of the techniques that have been used, for author identification and for other aspects of classification.  ...  The length of each text can also make a big difference, and the homogeneity.  ... 
doi:10.1109/eisic.2013.61 dblp:conf/eisic/Zechner13 fatcat:u3lnlbnjpfhglphghovyxv46eq

A Review of Text Corpus-Based Tourism Big Data Mining

Qin Li, Shaobo Li, Sen Zhang, Jie Hu, Jianjun Hu
2019 Applied Sciences  
We summarize and discuss different text representation strategies, text-based NLP techniques for topic extraction, text classification, sentiment analysis, and text clustering in the context of tourism  ...  profiles, and make policies for supervising tourism markets.  ...  Currently, more and more works have started to notice the different semantics of a word in different contexts.  ... 
doi:10.3390/app9163300 fatcat:chb3pbtj5jgq7fauniomsb22yu

Big Data Recommendation Research Based on Travel Consumer Sentiment Analysis

Zhu Yuan
2022 Frontiers in Psychology  
The test results show that online travel reviews can be an important data source for travel big data recommendation, and the proposed method can quickly and accurately achieve travel sentiment classification  ...  Firstly, Internet travel reviews are pre-processed for sentiment analysis of the review text.  ...  A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimed.  ... 
doi:10.3389/fpsyg.2022.857292 pmid:35295387 pmcid:PMC8918497 fatcat:ywfnuiu6gngabggnypvanu7x3e

Digital Library Information Integration System Based on Big Data and Deep Learning

Xiao Lin, Ying Zhang, Jiangong Wang, C. Venkatesan
2022 Journal of Sensors  
In order to solve the defects of traditional text classification in digital library, the author proposes a method based on deep learning in the field of big data and artificial intelligence, which is applied  ...  On the basis of systematically sorting out the traditional text classification of digital library of this method, this paper proposes a digital library text classification model based on deep learning  ...  Acknowledgments The second batch of science and technology projects of Fuzhou Science and Technology Bureau: digital library knowledge service platform based on WeChat official account's open source big  ... 
doi:10.1155/2022/9953787 fatcat:k2rx2jggafbhrddrnhe3jvq5py

Semantic indexing with deep learning: a case study

Yan Yan, Xu-Cheng Yin, Bo-Wen Zhang, Chun Yang, Hong-Wei Hao
2016 Big Data Analytics  
Next, we construct a high-dimensional space representation with Wikipedia category extension, which contains more semantic information than bag-of-words.  ...  for MEDLINE citations and a massive collection with only the title and abstract information.  ...  can be described with different words or different language modes.  ... 
doi:10.1186/s41044-016-0007-z fatcat:zs6zrrrdm5abnfdg2rabohc5dm

Sentiment Analysis on Social Media and Online Review

Rajni Singh, Rajdeep Kaur
2015 International Journal of Computer Applications  
This paper develops a combined dictionary based on social media keywords and online review and also find hidden relationship pattern from these keyword.  ...  It is one of the most common mistakes a text analytics engine makes when trying to analyze text for sentiment. Even humans have trouble, as they can analyze with 80% accuracy.  ...  Here, we made a list of around 320 words and created a text file for it.  ... 
doi:10.5120/21660-5072 fatcat:qdsqgnijnbau3fs465g2apidqe

An Improved Method of Feature Selection Based on Concept Attributes in Text Classification [chapter]

Shasha Liao, Minghu Jiang
2005 Lecture Notes in Computer Science  
According to the experiment results, we conclude that we can get enough information from the combined feature set for classification and efficiently reduce the useless features and the noises.  ...  The feature selection and weighting are two important parts of automatic text classification. In this paper we give a new method based on concept attributes.  ...  Therefore, the researchers have developed a lot of techniques which are fit for Chinese text classification and the text classification begin to boom.  ... 
doi:10.1007/11539087_152 fatcat:gmykztjahjhifdwkgknnrvxeem

New approch of opinion analysis from big social data environment using a supervised machine learning algirithm

Wiam Saidi, Abdellatif El Abderahmani, Khalid Satori, S. Bourekkadi, H. Hami, A. Mokhtari, K. Slimani, A. Soulaymani
2021 E3S Web of Conferences  
This process starts with the collection of reviews and their annotation followed by a text pre-processing phase in order to extract words that are reduced to their root.  ...  These words will be used for the construction of input variables using several combinations of extraction and weighting schemes.  ...  In its simplest form, it then assigns a polarity (positive, negative, neutral) to a text, i.e., it determines whether a text is positive, negative, or neutral by extracting particular words or phrases.  ... 
doi:10.1051/e3sconf/202131901037 fatcat:rmusjwdnsncrlea6obqmsfvli4

A Text Classification Application: Poet Detection from Poetry [article]

Durmus Ozkan Sahin, Oguz Emre Kural, Erdal Kilic, Armagan Karabina
2018 arXiv   pre-print
Chi-Square technique are used for feature selection. In addition, five different classification algorithms are tried.  ...  With the widespread use of the internet, the size of the text data increases day by day. Poems can be given as an example of the growing text.  ...  Instead of using all the terms in the bag of words, little word vector is preferred. Little vector is the best subset of bag of words.  ... 
arXiv:1810.11414v1 fatcat:r7lfjclg75flxdn2cemixpmybu

Topic Modeling for Interpretable Text Classification From EHRs

Emil Rijcken, Uzay Kaymak, Floortje Scheepers, Pablo Mosteiro, Kalliopi Zervanou, Marco Spruit
2022 Frontiers in Big Data  
Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable.  ...  In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification.  ...  Using topic modeling algorithms as topic embeddings for text classification might make a model more explainable.  ... 
doi:10.3389/fdata.2022.846930 pmid:35600326 pmcid:PMC9114871 fatcat:nzglxb5mtvalfc4fcklzosmggy

Network Public Opinion Monitoring System for Agriculture Products Based on Big Data

He Liu, Zekun Yu, Xiangzhi Zhong, Helong Yu, Imran Sarwar Bajwa
2021 Scientific Programming  
This research is based on big data technology to develop an agricultural products' network public opinion monitoring system that can collect, process, and analyze data in real time, discover and track  ...  basis of the decision-making of relevant departments.  ...  Sentiment classification is different from domain classification. e general feature extraction algorithm in domain text classification can play a very good classification effect, but it has its own independent  ... 
doi:10.1155/2021/9976001 fatcat:2tgfjccudzaa3cwuiv5ojcfane

Low Cost Page Quality Factors To Detect Web Spam

Ashish Chandra
2020 Zenodo  
Web spam is a big challenge for quality of search engine results. It is very important for search engines to detect web spam accurately.  ...  This classifier can be applied to search engine results on real time because calculation of these features require very little CPU resources.  ...  Differentiating between desirable and undesirable content is a big challenge for users as well as for search engines.  ... 
doi:10.5281/zenodo.3876356 fatcat:5xfzzmiv35hmlc7gsmecgjpj2m
« Previous Showing results 1 — 15 out of 85,966 results