1,924 Hits in 5.1 sec


Petr ŠALOUN, Palacky University Olomouc, Krizkovskeho 511/8, CZ-771 47 Olomouc, Czech Republic, Barbora CIGÁNKOVÁ, David ANDREŠIČ, Lenka KRHUTOVÁ, Faculty of Electrical Engineering and Computer Science, VSB - Technical University of Ostrava, Ostrava, Czech Republic, Faculty of Electrical Engineering and Computer Science, VSB - Technical University of Ostrava, Ostrava, Czech Republic, Faculty of Social Studies, University of Ostrava, Ostrava, Czech Republic
2021 Acta Electrotechnica et Informatica  
Its goal is to test text documents classifier based on documents similarity measured by N-grams method and to design evaluation and crowdsourcing-based classification improvement mechanism.  ...  This work describes our approach to classification of text documents and its improvement through crowdsourcing.  ...  After an analysis of classification algorithms, N-grams algorithm was chosen, mainly for its language independence but also for is easy implementa-tion.  ... 
doi:10.15546/aeei-2021-0013 fatcat:7ou7ipbynbbetep6l5wnydlzni

An automatic text comprehension classifier based on mental models and latent semantic features

Felipe Bravo-Marquez, Gaston L'Huillier, Patricio Moya, Sebastián A. Ríos, Juan D. Velásquez
2011 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies - i-KNOW '11  
A numerical characterization of students' documents using structural information, such as the usage of text connectors, and latent semantic features are used as input for traditional classification algorithms  ...  For the evaluation of the proposed methodology, using a set of stimulus documents, a set of questions must be answered by an experimental group of students.  ...  Moreover, each cell in the matrix has as value the number of occurrences of the n-gram in the document.  ... 
doi:10.1145/2024288.2024317 dblp:conf/iknow/Bravo-MarquezLMRV11 fatcat:6gchuos44rhzfl27uhdqjvyhny

Towards the Automated Evaluation of Crowd Work: Machine-Learning Based Classification of Complex Texts Simplified by Laymen

Holger Hoffmann, Angelika Bullinger, Christiane Fellbaum
2013 2013 46th Hawaii International Conference on System Sciences  
To achieve this, we identify and select text attributes from different disciplines as input for machinelearning classification algorithms and evaluate the suitability of three well regarded algorithms,  ...  However, an increase of crowd work entails increasing effort to evaluate the quality of the submissions.  ...  The main algorithms described by Lin [27] are based on n-grams (ROUGE-N, where n is the length of the n-gram), (weighted) longest common substrings (ROUGE-L, ROUGE-W) or skip bigrams and unigrams (ROUGE-S  ... 
doi:10.1109/hicss.2013.568 dblp:conf/hicss/HoffmannBF13 fatcat:sitb2bil2fbenjvovvq3nwqb6q

Sentiment Analysis Using Text Mining: A Review

Swati Redhu
2018 International Journal on Data Science and Technology  
This paper provides an overview of different methods used in text mining and sentiment analysis elaborating on all subtasks.  ...  similar sources.  ...  They concluded that in n-gram approach as the value of n increases, the classification accuracy decreases.  ... 
doi:10.11648/j.ijdst.20180402.12 fatcat:eeweilxmenev7oltuzqf3dxwoi

Autism spectrum disorder detection from semi-structured and unstructured medical data

Jianbo Yuan, Chester Holtz, Tristram Smith, Jiebo Luo
2016 EURASIP Journal on Bioinformatics and Systems Biology  
Our detecting framework involves converting semi-structured and unstructured medical forms into digital format, preprocessing, learning document representation, and finally, classification.  ...  Therefore, to benefit autism patients by enhancing their access to treatments such as early intervention, we aim to develop a robust machine learning-based system for autism detection by using Natural  ...  good representations of documents to capture the semantics behind text contents is central to a wide range of NLP tasks such as sentiment analysis, and document classification as in our case.  ... 
doi:10.1186/s13637-017-0057-1 pmid:28203249 pmcid:PMC5288414 fatcat:qymjh2lm4nfyrgwdbkuul5e27m

Opinion Mining From Social Media Short Texts: Does Collective Intelligence Beat Deep Learning?

Nicolas Tsapatsoulis, Constantinos Djouvas
2019 Frontiers in Robotics and AI  
The basic purpose of this paper is to compare various kinds of low-level features, including those extracted through deep learning, as in fasttext and Doc2Vec, and keywords suggested by the crowd, called  ...  On the other hand, opinion mining in social media is nowadays an important parameter of social media marketing.  ...  learning, as expressed through the modeling of those short texts (i.e, tweets and Facebook comments) with character n-grams as in Doc2Vec and the fastText (2018) classifier.  ... 
doi:10.3389/frobt.2018.00138 pmid:33501016 pmcid:PMC7805642 fatcat:2i2r6hicanevvhwxcw7iu2rvv4

Fake News Detection with Semantic Features and Text Mining

Pranav Bharadwaj, Zongru Shao
2019 International Journal on Natural Language Computing  
Evaluated with real or fake dataset from, the best performing model achieved an accuracy of 95.66% using bigram features with the random forest classifier.  ...  The fact that bigrams outperform unigrams, trigrams, and quadgrams show that word pairs as opposed to single words or phrases best indicate the authenticity of news.  ...  N-grams are continuous chucks of n items from a tokenized sequence for a document.  ... 
doi:10.5121/ijnlc.2019.8302 fatcat:isvu5oe7tfe3dmdfijam3q35ea

A Novel System for Document Classification Using Genetic Programming

Saad M. Darwish, Adel A. EL-Zoghabi, Doaa B. Ebaid
2015 Journal of Advances in Information Technology  
The proposed work mitigates this difficult by providing an algorithm to classify documents into more than two categories (multi-class classification) at the same time by combining multi-objective technique  ...  Document retrieval, categorization, routing and filtering can all be formulated as classification problems.  ...  INTRODUCTION According to the growth in the amount of text documents over the internet and news sources which make document classification is an important task in document processing.  ... 
doi:10.12720/jait.6.4.194-200 fatcat:d2cn2mmxhrarpankkff5gufhym

Non-linear Mapping for Improved Identification of 1300+ Languages

Ralf Brown
2014 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)  
Non-linear mappings of the form P (ngram) γ and log(1+τ P (ngram)) log(1+τ ) are applied to the n-gram probabilities in five trainable open-source language identifiers.  ...  The second mapping improves four of the five identifiers by 10.6% to 83.8% on the larger corpus and 14.4% to 76.7% on the smaller corpus.  ...  We then apply a simple modification to their scoring algorithms which improves the classification accuracy of all five of them, three quite dramatically.  ... 
doi:10.3115/v1/d14-1069 dblp:conf/emnlp/Brown14 fatcat:d7naeus2q5hh3fuum5r4euoyba

A Similarity-based Machine Learning Approach for Detection of Software Clones

Abdullah M. Sheneamer
2021 Expert systems with applications  
To manage such a massive number of documents effectively, an intelligent text document classification system is proposed in this paper.  ...  As a result, an enormous amount of unstructured data is created that demands much time and effort to organize, search or manipulate.  ...  Acknowledgements This work was supported by the Establishment of CUET IT Business Incubator Project, BHTPA, ICT Division, Bangladesh for the research on "Automatic Bengali Document Categorization based  ... 
doi:10.1016/j.eswa.2021.115394 fatcat:44sqcdpj7nfvjoa33u4dbjcpmi

Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words

Joan Serrà, Ilias Leontiadis, Dimitris Spathis, Gianluca Stringhini, Jeremy Blackburn, Athena Vakali
2017 Proceedings of the First Workshop on Abusive Language Online  
To better deal with these issues in those fast-paced environments, we propose using the error signal of class-based language models as input to text classification algorithms.  ...  Common approaches to text categorization essentially rely either on n-gram counts or on word embeddings.  ...  Acknowledgments This work has been fully funded by the European Commission as part of the ENCASE project (H2020-MSCA-RISE of the European Union under GA number 691025).  ... 
doi:10.18653/v1/w17-3005 dblp:conf/acl-alw/SerraLSSBV17 fatcat:xzc3cmedo5hqzjycxt2bahcjlm

SVM significant role selection method for improving semantic text plagiarism detection

Ahmed Hamza Osman, Omar M. Barukab
2017 International Journal of Advanced and Applied Sciences  
This research introduces an approach for the prediction and detection of plagiarized text based on Semantic Role Labelling (SRL) and Support Vector Machine (SVM).  ...  The outcomes proved that the introduced strategy enhanced the execution as far as the assessment measures contrasted and other plagiarism detection methods.  ...  The authors would like to thank the Deanship of Scientific Research Management (DSR) King Abdulaziz University for the support and incentive extended in making this study a success.  ... 
doi:10.21833/ijaas.2017.08.016 fatcat:eollj72fljbmpkqjerjsca3rre

Predicting real estate market trends and value using pre-processing and sentiment text mining analysis

Nikolay Sinyak, Singh Tajinder, Jaglan Madhu Kumari, Vitaliy Kozlovskiy
2021 Real Estate Economics Management  
Gigantic growth of text mining is becoming a potential source of crowd wisdom extraction and analysis especially in terms of text pre-processing and sentiment analysis.  ...  Empha-sis is placed on the resources and learning mechanism available to real estate researchers and practitioners, as well as the major text mining tasks of interest to the community.  ...  Therefore, the probability of the consequent word in a word sequence in n-gram to the conditional probability is described as: | 1 1 1 1 . n n n n n N P t t P t t (4) In case of 2-gram, from probability  ... 
doi:10.22337/2073-8412-2021-1-35-43 fatcat:vt2fqf3itbaivmtnrcysxw4szi

Evolutionary Data Measures: Understanding the Difficulty of Text Classification Tasks [article]

Edward Collins, Nikolai Rozanov, Bingbing Zhang
2018 arXiv   pre-print
Classification tasks are usually analysed and improved through new model architectures or hyperparameter optimisation but the underlying properties of datasets are discovered on an ad-hoc basis as errors  ...  We then propose an intuitive measure of difficulty for text classification datasets which is simple and fast to calculate.  ...  Text Emotion Classification (TE) The Text Emotion Classification dataset is crowd-sourced by (FigureEight, 2018) , it contains 13 classes for emotional content like happiness or sadness.  ... 
arXiv:1811.01910v2 fatcat:eyiim4laqfdkjdvn6o3kx2arm4

Multi-Task Deep Learning for Legal Document Translation, Summarization and Multi-Label Classification [article]

Ahmed Elnaggar, Christoph Gebendorfer, Ingo Glaser, Florian Matthes
2018 arXiv   pre-print
Tasks such as the classification of legal documents or contract clauses as well as the translation of those are highly relevant.  ...  The experiments were conducted on legal document corpora utilizing several task combinations as well as various model parameters.  ...  Acknowledgements We gratefully acknowledge the support of Leibniz-Rechenzentrum, Microsoft Corporation and NVIDIA Corporation with hardware which were used for this research.  ... 
arXiv:1810.07513v1 fatcat:qdjmv6mwtvhdfn4thpjgwoxrhq
« Previous Showing results 1 — 15 out of 1,924 results