1,479 Hits in 4.3 sec

Analysis and representation of Igbo text document for a text-based system [article]

Ifeanyi-Reuben Nkechi J., Ugwu Chidiebere, Adegbola Tunde
2020 arXiv   pre-print
This paper presents the analysis of Igbo language text document, considering its compounding nature and describes its representation with the Word-based N-gram model to properly prepare it for any text-based  ...  The result shows that Bigram and Trigram n-gram text representation models provide more semantic information as well addresses the issues of compounding, word ordering and collocations which are the major  ...  The document is represented by word-based ngram model. Given document j characterised by d j , f ij is frequency of n-gram words nw i in the document j .  ... 
arXiv:2009.06376v1 fatcat:yta6t4cxdnaxfeqt34f6gbvzlq


Petr ŠALOUN, Palacky University Olomouc, Krizkovskeho 511/8, CZ-771 47 Olomouc, Czech Republic, Barbora CIGÁNKOVÁ, David ANDREŠIČ, Lenka KRHUTOVÁ, Faculty of Electrical Engineering and Computer Science, VSB - Technical University of Ostrava, Ostrava, Czech Republic, Faculty of Electrical Engineering and Computer Science, VSB - Technical University of Ostrava, Ostrava, Czech Republic, Faculty of Social Studies, University of Ostrava, Ostrava, Czech Republic
2021 Acta Electrotechnica et Informatica  
Its goal is to test text documents classifier based on documents similarity measured by N-grams method and to design evaluation and crowdsourcing-based classification improvement mechanism.  ...  effective visualization and navigation over the content made by most by the community itself and personalized on a level of informal carer's phase of the care-taking timeline.  ...  Great advantage of classification using N-grams is its independence on document language, because there is no need for text pre-processing such as stemming or lemmatization.  ... 
doi:10.15546/aeei-2021-0013 fatcat:7ou7ipbynbbetep6l5wnydlzni

How Weak Categorizers Based Upon Different Principles Strengthen Performance

V. S. Uren
2002 Computer journal  
Further experiments using two less orthodox categorizers are also presented which suggest that combining text categorizers can be successful, provided the essential element of 'difference' is considered  ...  However, published work on combining text categorizers suggests that, for this particular application, improvements in performance are hard to attain.  ...  A number of promising n-gram-based categorizers have been reported in the literature including: an error tolerant categorizer for OCR documents [27] ; Acquaintance, a language-independent system that  ... 
doi:10.1093/comjnl/45.5.511 fatcat:soox3dwabbd65cbhjdn3utlmhm

Mantle Convection with Plates and Mobile, Faulted Plate Margins

S. Zhong, M. Gurnis
1995 Science  
Unfortunately, the surface expression of these recent convection models is unlike the pattern of flow displayed by plate tectonics.  ...  In addition to these instantaneous features of plate tectonics, plate kinematics are time-dependent and are characterized by The authors are with the  ...  The shallow-dipping slabs under the continents are caused by oceanward continental motion or trench migration (6, 10) (Figs. 3A and 4B) Gauging Similarity with n-Grams: Language-Independent Categorization  ... 
doi:10.1126/science.267.5199.838 pmid:17813909 fatcat:yvnm3u4cxrhx3eldj5mdp4xf5e

Strategies for Neutralising Sexually Explicit Language

George R. S. Weir, Ana-Maria Duta
2012 2012 Third Cybercrime and Trustworthy Computing Workshop  
In this paper we describe our approach to characterising dimensions of sexually explicit language and outline their use in strategies for neutralising such language.  ...  By controlling the quantity and degree of such content, we aim to minimise any detrimental effects (observer impact) that such content may have on ill-prepared individuals.  ...  Their approach combines 'bag of words' with term weighting and Bayesian probability calculation of n-gram sequences (multi-word units).  ... 
doi:10.1109/ctc.2012.17 fatcat:m2pxoyo4yjet3apueuil45bbri

The Public Sentiment and Emotional Variations in Social Media using Twitter Dataset

The performance to be obtained by tuning the internal parameters.  ...  The collection of applications (internet followed) that provide way to create communication of user-generated matter by the social media (Twitter, Facebook, Whatsapp, etc.,).  ...  E. n Grams Within the fields of computational linguistics and possibility, an n-gram is a contiguous continuation of n items from a given content of text or verbalization.  ... 
doi:10.35940/ijitee.l3358.1081219 fatcat:p7cu7cbxozbkpj565l6z3m33gm

Leveraging Social Networks for Toxicovigilance

Michael Chary, Nicholas Genes, Andrew McKenzie, Alex F. Manini
2013 Journal of Medical Toxicology  
Traditional means of characterizing these changes, such as national surveys or voluntary reporting by frontline clinicians, can miss changes in usage the emergence of novel drugs.  ...  We outline a structured approach to analyze social media in order to capture emerging trends in drug abuse by applying powerful methods from artificial intelligence, computational linguistics, graph theory  ...  n-Grams An n-gram is a contiguous sequence of n words. The previous sentence is a nine-gram. Both collocations and the frequency of single words are special cases of n-grams.  ... 
doi:10.1007/s13181-013-0299-6 pmid:23619711 pmcid:PMC3657021 fatcat:vwd5fldwsnh2pgfdragu4gxsp4

Emotion Classification By Incremental Association Language Features

Jheng-Long Wu, Pei-Chann Chang, Shih-Ling Chang, Liang-Chih Yu, Jui-Feng Yeh, Chin-Sheng Yang
2010 Zenodo  
Major Depressive Disorder can be defined into different categories by previous human activities. According to machine learning, we can classify emotion in correct textual language in advance.  ...  We present an approach that to discovery words in sentence and it can find in high frequency in the same time and can-t overlap in each category, called Association Language Features by its Category (ALFC  ...  , yet cannot easily captured by n-grams.  ... 
doi:10.5281/zenodo.1329237 fatcat:jrvosw2ijnfavdcdi2d7jfpy3e

Panoptical View of the Sentiment Analysis Techniques

2019 International Journal of Engineering and Advanced Technology  
It has changed the way the information is perceived and utilized by big business groups, brands and marketing agencies by demonstrating that the computational recognition of a sentimental expression is  ...  Sentiment analysis (SA) is a rapidly evolving field that aims at computationally categorizing the opinions of people about a particular product, movie, brand or anything that can be opined.  ...  The simplest algorithm to solve clustering problems by partitioning the n observations into k clusters.  ... 
doi:10.35940/ijeat.a9537.109119 fatcat:c7kgxbjne5enflnizof2k4iet4

"The Library of Congress at a Glance": Text Visualization and Reference Rooms Without Walls

Lee A. Gladwin
1994 IASSIST Quarterly  
A statistical approach was also adopted by the National Security Agency's ACQUAINTANCE program which employs a "language-independent n-gram method of sorting and retrieving documents by language and topics  ...  N-grams refers to "sequences of n consecutive characters" [Damashek, 39] . PARENTAGE, a visualization program, is used to explore the retrieved documents (See below).  ... 
doi:10.29173/iq640 fatcat:ygo4xtgi3bepjeohe2z5yksrd4

Discovering interesting usage patterns in text collections

Anthony Don, Elena Zheleva, Machon Gregory, Sureyya Tarkan, Loretta Auvil, Tanya Clement, Ben Shneiderman, Catherine Plaisant
2007 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM '07  
Users can find meaningful co-occurrences of text patterns by visualizing them within and across documents in the collection.  ...  The current implementation focuses on frequent itemsets of n-grams, as they capture the repetition of exact or similar expressions in the collection.  ...  patterns of n-grams in the rest of this paper.  ... 
doi:10.1145/1321440.1321473 dblp:conf/cikm/DonZGTACSP07 fatcat:524qzhxztnh37mklli774wpewy

Speak-Correct: A Computerized Interface for the Analysis of Mispronounced Errors

Kamal Jambi, Hassanin Al-Barhamtoshy, Wajdi Al-Jedaibi, Mohsen Rashwan, Sherif Abdou
2022 Computer systems science and engineering  
Any natural language may have dozens of accents.  ...  The n-gram model produces more word-like elements than the (n-1)-gram approach.  ...  Furthermore, the N-gram language concept as well as the HMM are described in greater depth.  ... 
doi:10.32604/csse.2022.024967 fatcat:cjxeeidsznfh5objdiqzs3yheu

Developing classification-based named entity recognizers (NER) for Sambalpuri and Odia applying support vector machines (SVM)

Pitambar Behera, Sharmin Muzaffar
2018 Nepalese Linguistics  
NER is the process of detecting Named Entities (NEs) in a document and to categorize them into certain named entity classes such as the names of organization, person, location, sport, river, city, country  ...  The tri-gram feature file has been applied.  ... 
doi:10.3126/nl.v33i1.41066 fatcat:domjsbw7wbdkjc5ht44kqchpdi

The application of text mining methods in innovation research: current state, evolution patterns, and development priorities

David Antons, Eduard Grünwald, Patrick Cichy, Torsten Oliver Salge
2020 R & D Management  
If the documents have been made available under an Open Content Licence (especially Creative Commons Licences), you may exercise further usage rights as specified in the indicated licence.  ...  Specific algorithms are now available (e.g., Wang et al., 2007) to detect such word combinations, also known as n-grams, in text and mark them. 2 Usually, n-grams have to be replaced before constructing  ...  , Statistics Computer Science, Data Science, Statistics Selected technique(s) White space separation n-grams Stemming, Lemmatization, Deletion of stop words, infrequent term and tf-idf weighing Named-entity  ... 
doi:10.1111/radm.12408 fatcat:6wxk3nziibagdbterp7bczherm

Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives

W. Byrne, D. Doermann, M. Franz, S. Gustman, J. Hajic, D. Oard, M. Picheny, J. Psutka, B. Ramabhadran, D. Soergel, T. Ward, W.-J. Zhu
2004 IEEE Transactions on Speech and Audio Processing  
This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of  ...  can be adapted to approximate decisions made by human annotators.  ...  segment categorization, the individual word n-gram features are used to predict the likelihood of a segment being assigned to a category.  ... 
doi:10.1109/tsa.2004.828702 fatcat:ixe6jfw3drbnndx5xg2yxjkdvq
« Previous Showing results 1 — 15 out of 1,479 results