258 Hits in 10.8 sec

A Literature Review: Stemming Algorithms for Indian Languages [article]

M.Thangarasu, R.Manavalan
2013 arXiv   pre-print
Stemming is the process of extracting root word from the given inflection word. It also plays significant role in numerous application of Natural Language Processing (NLP).  ...  This expository paper presents survey of some of the latest developments on stemming algorithms in data mining and also presents with some of the solutions for various Indian language stemming algorithms  ...  From plain character strings to meaningful words: Producing Transaction between the state(graph,table)To find the root word of a word better full text databases for inflectional and compounding languages  ... 
arXiv:1308.5423v1 fatcat:7lqgly746jgprf5hnnqjvughzi

Multi-word Entity Classification in a Highly Multilingual Environment

Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, Jakub Piskorski
2017 Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)  
We also want to thank the IC1207 COST Action PARSEME and SIGLEX for their endorsement and support, as well as the EACL 2017 organizers.  ...  We would like to thank the members of the program committee for the timely reviews, authors for their valuable contributions, shared task organizers, annotators, and system developers for their hard work  ...  We thank the annotators for their work and the anonymous reviewers for their insightful comments. We thank Nikola Ljubešić for his help with the hrMWELex lexicon.  ... 
doi:10.18653/v1/w17-1702 dblp:conf/mwe/ChesneyJSP17 fatcat:bv7aavgth5eurmzuphuowtuuhq

Restoring Arabic vowels through omission-tolerant dictionary lookup

Alexis Amid Neme, Sébastien Paumier
2019 Language Resources and Evaluation  
Our program performs the analysis of 5000 words/second for running text (20 pages/second).  ...  technologies.f In this research, we present Arabic-Unitex, an Arabic Language Resource, with emphasis on vowel representation and encoding.  ...  Chennoufi & Mazroui (2016) demonstrate that "combining morphological analysis, syntactic and diacritic rules used in a pipeline with statistical processing produces better performance than other systems  ... 
doi:10.1007/s10579-019-09464-6 fatcat:chdbye2d55fhxdvvbp4so2wloi

Abstract Syntax as Interlingua: Scaling Up the Grammatical Framework from Controlled Languages to Robust Pipelines

Aarne Ranta, Krasimir Angelov, Normunds Gruzitis, Prasanth Kolachina
2020 Computational Linguistics  
This makes it possible for GF to utilize data from the other approaches and to build robust pipelines.  ...  GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and web applications.  ...  Acknowledgments The authors want to thank the anonymous referees for their substantial and insightful comments.  ... 
doi:10.1162/coli_a_00378 fatcat:pgydeto4sncmngs2oonpm7s2gm


V.N. Golodnaya
2017 Russian Linguistic Bulletin  
The article is devoted to the issues of obscene words pragmatic functions and contexts of usage in the Russian and English languages in the framework of corpora approach.  ...  Some reasons for spreading these words in all types of communication have been defined as both semantic and social phenomena.  ...  Thus, having conducted the analysis of the semantic and morphological nature of records of the words under consideration in the English language we can see (a) the dependence of the record from the approach  ... 
doi:10.18454/rulb.9.11 fatcat:eudotg3xsvhqjjsjdeddyjqrfa

Information Extraction and Automatic Markup for XML Documents [chapter]

Mohammad Abolhassani, Norbert Fuhr, Norbert Gövert
2003 Lecture Notes in Computer Science  
Morphological and lexical processing deals with the recognition of inflected word forms. 3.  ...  Morphological and Lexical Processing Following tokenisation, the system first has to detect inflectional variants of word forms. For some languages, e.g.  ...  and events, from (the relevant text) documents, based on predefined templates. automatic markup Automatic markup is the process of marking or tagging a document in order to specify and indicate its global  ... 
doi:10.1007/978-3-540-45194-5_11 fatcat:zsfi5kr2sjh37ldqigc3zquvu4

Main results of MONDILEX project

Ludmila Dimitrova, Violetta Koseska-Toszewa, Radovan Garabík, Tomaž Erjavec, Leonid Iomdin, Volodymyr Shyrokov
2015 Cognitive Studies | Études cognitives  
The paper summarizes the research undertaken on standardisation and integration of Slavic language resources and on the establishment of a virtual organisation supporting research infrastructure for Slavic  ...  Main results of MONDILEX projectThe paper presents the results and recommendations of MONDILEX, a 7FP project that covered six Slavic languages: Bulgarian, Polish, Russian, Slovak, Slovene, and Ukrainian  ...  We would like to thank all colleagues from the six MONDILEX participants' teams from IMI-BAS (Sofia, Bulgaria), ISS-PAS (Warsaw, Poland), LSIL-SAS (Bratislava, Slovakia), J.  ... 
doi:10.11649/cs.2011.017 fatcat:6yil3iclyfbjpo25h6p7ko6ria

Improving the Tokenisation of Identifier Names [chapter]

Simon Butler, Michel Wermelinger, Yijun Yu, Helen Sharp
2011 Lecture Notes in Computer Science  
Accuracy was evaluated by comparing the output of our algorithm to manual tokenisations of 28,000 identifier names drawn from 60 open source Java projects totalling 16.5 MSLOC.  ...  First, it improves tokenisation accuracy for identifier names of a single case and those containing digits. Second, performance gains over existing techniques are achieved using smaller oracles.  ...  Acknowledgements We would like to thank the anonymous reviewers on the ECOOP 2011 Program Committee, and Tiago Alves and Eric Bouwers for their thoughtful comments that have helped improve this paper.  ... 
doi:10.1007/978-3-642-22655-7_7 fatcat:n7cyw7ohhrctzio5gc4djhnvie

Data-driven materials research enabled by natural language processing and information extraction

Elsa A. Olivetti, Jacqueline M. Cole, Edward Kim, Olga Kononova, Gerbrand Ceder, Thomas Yong-Jin Han, Anna M. Hiszpanski
2020 Applied Physics Reviews  
DATA AVAILABILITY Data sharing is not applicable to this article as no new data were created or analyzed in this study.  ...  and prefixes) and morphological inflections (number and tense).  ...  model, which is trained on full text. 64 Other word embedding models that have been used in the materials science domain include FastText, 65 Embeddings from Language Models (ELMo), 66 and BERT.  ... 
doi:10.1063/5.0021106 fatcat:75aap3lkjvhprleptl3bbp6w64

Glossary extraction and utilization in the information search and delivery system for IBM Technical Support

L. Kozakov, Y. Park, T. Fin, Y. Drissi, Y. Doganata, T. Cofino
2004 IBM Systems Journal  
TALENT (Text Analysis and Language Engineering Technology) is a general suite of text analysis tools developed at the IBM Thomas J.  ...  This utility is based on the TALENT 7 text analysis engine (TAE) configured to work with the Technical Support glossary.  ...  Fin is a senior software engineer at the Watson Research Center. He received a Ph.D. degree in computer science from Tokyo Institute of Technology, Tokyo, Japan, in 1982.  ... 
doi:10.1147/sj.433.0546 fatcat:xwca3uq765f4jplxsk4jzemgsi

Multilinear Grammar: Ranks and Interpretations

Dafydd Gibbon, Sascha Griffiths
2017 Open Linguistics  
The architecture defines a Sui Generis Condition on ranks, from discourse through utterance and phrasal structures to the word, with its sub-ranks of morphology and phonology.  ...  The framework provides a realistic background for the gradual development of complexity in the phylogeny and ontogeny of language, and clarifies a range of challenges for the evaluation of realistic linguistic  ...  modelled by left branching or right branching structures; 4. at word rank and the morphology and phonology sub-ranks, for the linear combinatorics of inflectional and derivational affixation, compounding  ... 
doi:10.1515/opli-2017-0014 fatcat:wp7mjia5ezhp5bufvcrj3hjpni

Automatic extraction of function–behaviour–state information from patents

G. Fantoni, R. Apreda, F. Dell'Orletta, M. Monge
2013 Advanced Engineering Informatics  
A second goal is to develop a protocol based on free software and database resources, so that it could be replicable with limited effort by everyone without having to rely on commercial databases.  ...  The purpose of the research is to try to detect and extract information about the functions, the physical behaviours and the states of the system directly from the text of a patent in an automatic way.  ...  Acknowledgements The financial supports of RobLog Project (FP7 ICT-270350) and LILIT Project (PAR FAS REGIONE TOSCANA Linea di Azione 1.  ... 
doi:10.1016/j.aei.2013.04.004 fatcat:3rtyn7w5lrfk5jvreikzy54tja

Variations on language modeling for information retrieval

Wessel Kraaij
2005 SIGIR Forum  
Variations on Language Modeling for Information Retrieval W. Kraaij -Enschede: Neslia Paniculata. Thesis Enschede -With ref. With summary ISBN 90-75296-09-6  ...  Since we did not have access to full morphological analysis for Italian, we used a simple, freely-distributed stemmer from the Open Muscat project. 2 For French and English, we lemmatized each word-form  ...  (c) Optional (2): Exclude words that start with a different character. A second fine-tuning step constrains expansion terms to words that start with the same character.  ... 
doi:10.1145/1067268.1067291 fatcat:h23lp5aqfvfu5iecwnihfme244

Has Computational Linguistics Become More Applied? [chapter]

Kenneth Church
2009 Lecture Notes in Computer Science  
Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words.  ...  Future work aims to assign grammar rules and lexical entries in order to produce coherent texts that follow on from the generated text structures in several languages. Abstract.  ...  natural language generators that aim to produce written text either for textual presentation or for eventual use by text-to-speech system.  ... 
doi:10.1007/978-3-642-00382-0_1 fatcat:oddvfzds4nfwjam2ccqeaxe2y4

Natural Language Processing - A Survey [article]

Kevin Mote
2012 arXiv   pre-print
The utility and power of Natural Language Processing (NLP) seems destined to change our technological society in profound and fundamental ways.  ...  However there are, to date, few accessible descriptions of the science of NLP that have been written for a popular audience, or even for an audience of intelligent, but uninitiated scientists.  ...  In other words, "NL is essentially the language of human thought." Clearly, the conclusion to this debate will have significant ramifications for the future of NLP.  ... 
arXiv:1209.6238v1 fatcat:7ju5x3bguvhzlavukgx2sli4zy
« Previous Showing results 1 — 15 out of 258 results