A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Literature Review: Stemming Algorithms for Indian Languages
[article]
2013
arXiv
pre-print
Stemming is the process of extracting root word from the given inflection word. It also plays significant role in numerous application of Natural Language Processing (NLP). ...
This expository paper presents survey of some of the latest developments on stemming algorithms in data mining and also presents with some of the solutions for various Indian language stemming algorithms ...
From plain character strings to meaningful words: Producing Transaction between the state(graph,table)To find the root word of a word better full text databases for inflectional and compounding languages ...
arXiv:1308.5423v1
fatcat:7lqgly746jgprf5hnnqjvughzi
Multi-word Entity Classification in a Highly Multilingual Environment
2017
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
We also want to thank the IC1207 COST Action PARSEME and SIGLEX for their endorsement and support, as well as the EACL 2017 organizers. ...
We would like to thank the members of the program committee for the timely reviews, authors for their valuable contributions, shared task organizers, annotators, and system developers for their hard work ...
We thank the annotators for their work and the anonymous reviewers for their insightful comments. We thank Nikola Ljubešić for his help with the hrMWELex lexicon. ...
doi:10.18653/v1/w17-1702
dblp:conf/mwe/ChesneyJSP17
fatcat:bv7aavgth5eurmzuphuowtuuhq
Restoring Arabic vowels through omission-tolerant dictionary lookup
2019
Language Resources and Evaluation
Our program performs the analysis of 5000 words/second for running text (20 pages/second). ...
technologies.f In this research, we present Arabic-Unitex, an Arabic Language Resource, with emphasis on vowel representation and encoding. ...
Chennoufi & Mazroui (2016) demonstrate that "combining morphological analysis, syntactic and diacritic rules used in a pipeline with statistical processing produces better performance than other systems ...
doi:10.1007/s10579-019-09464-6
fatcat:chdbye2d55fhxdvvbp4so2wloi
Abstract Syntax as Interlingua: Scaling Up the Grammatical Framework from Controlled Languages to Robust Pipelines
2020
Computational Linguistics
This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. ...
GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and web applications. ...
Acknowledgments The authors want to thank the anonymous referees for their substantial and insightful comments. ...
doi:10.1162/coli_a_00378
fatcat:pgydeto4sncmngs2oonpm7s2gm
OBSCENE WORDS IN POSITIVE MEANINGS (ON THE EXAMPLES OF ENGLISH AND RUSSIAN INVECTIVES)
2017
Russian Linguistic Bulletin
The article is devoted to the issues of obscene words pragmatic functions and contexts of usage in the Russian and English languages in the framework of corpora approach. ...
Some reasons for spreading these words in all types of communication have been defined as both semantic and social phenomena. ...
Thus, having conducted the analysis of the semantic and morphological nature of records of the words under consideration in the English language we can see (a) the dependence of the record from the approach ...
doi:10.18454/rulb.9.11
fatcat:eudotg3xsvhqjjsjdeddyjqrfa
Information Extraction and Automatic Markup for XML Documents
[chapter]
2003
Lecture Notes in Computer Science
Morphological and lexical processing deals with the recognition of inflected word forms. 3. ...
Morphological and Lexical Processing Following tokenisation, the system first has to detect inflectional variants of word forms. For some languages, e.g. ...
and events, from (the relevant text) documents, based on predefined templates. automatic markup Automatic markup is the process of marking or tagging a document in order to specify and indicate its global ...
doi:10.1007/978-3-540-45194-5_11
fatcat:zsfi5kr2sjh37ldqigc3zquvu4
Main results of MONDILEX project
2015
Cognitive Studies | Études cognitives
The paper summarizes the research undertaken on standardisation and integration of Slavic language resources and on the establishment of a virtual organisation supporting research infrastructure for Slavic ...
Main results of MONDILEX projectThe paper presents the results and recommendations of MONDILEX, a 7FP project that covered six Slavic languages: Bulgarian, Polish, Russian, Slovak, Slovene, and Ukrainian ...
We would like to thank all colleagues from the six MONDILEX participants' teams from IMI-BAS (Sofia, Bulgaria), ISS-PAS (Warsaw, Poland), LSIL-SAS (Bratislava, Slovakia), J. ...
doi:10.11649/cs.2011.017
fatcat:6yil3iclyfbjpo25h6p7ko6ria
Improving the Tokenisation of Identifier Names
[chapter]
2011
Lecture Notes in Computer Science
Accuracy was evaluated by comparing the output of our algorithm to manual tokenisations of 28,000 identifier names drawn from 60 open source Java projects totalling 16.5 MSLOC. ...
First, it improves tokenisation accuracy for identifier names of a single case and those containing digits. Second, performance gains over existing techniques are achieved using smaller oracles. ...
Acknowledgements We would like to thank the anonymous reviewers on the ECOOP 2011 Program Committee, and Tiago Alves and Eric Bouwers for their thoughtful comments that have helped improve this paper. ...
doi:10.1007/978-3-642-22655-7_7
fatcat:n7cyw7ohhrctzio5gc4djhnvie
Data-driven materials research enabled by natural language processing and information extraction
2020
Applied Physics Reviews
DATA AVAILABILITY Data sharing is not applicable to this article as no new data were created or analyzed in this study. ...
and prefixes) and morphological inflections (number and tense). ...
model, which is trained on full text. 64 Other word embedding models that have been used in the materials science domain include FastText, 65 Embeddings from Language Models (ELMo), 66 and BERT. ...
doi:10.1063/5.0021106
fatcat:75aap3lkjvhprleptl3bbp6w64
Glossary extraction and utilization in the information search and delivery system for IBM Technical Support
2004
IBM Systems Journal
TALENT (Text Analysis and Language Engineering Technology) is a general suite of text analysis tools developed at the IBM Thomas J. ...
This utility is based on the TALENT 7 text analysis engine (TAE) configured to work with the Technical Support glossary. ...
Fin is a senior software engineer at the Watson Research Center. He received a Ph.D. degree in computer science from Tokyo Institute of Technology, Tokyo, Japan, in 1982. ...
doi:10.1147/sj.433.0546
fatcat:xwca3uq765f4jplxsk4jzemgsi
Multilinear Grammar: Ranks and Interpretations
2017
Open Linguistics
The architecture defines a Sui Generis Condition on ranks, from discourse through utterance and phrasal structures to the word, with its sub-ranks of morphology and phonology. ...
The framework provides a realistic background for the gradual development of complexity in the phylogeny and ontogeny of language, and clarifies a range of challenges for the evaluation of realistic linguistic ...
modelled by left branching or right branching structures; 4. at word rank and the morphology and phonology sub-ranks, for the linear combinatorics of inflectional and derivational affixation, compounding ...
doi:10.1515/opli-2017-0014
fatcat:wp7mjia5ezhp5bufvcrj3hjpni
Automatic extraction of function–behaviour–state information from patents
2013
Advanced Engineering Informatics
A second goal is to develop a protocol based on free software and database resources, so that it could be replicable with limited effort by everyone without having to rely on commercial databases. ...
The purpose of the research is to try to detect and extract information about the functions, the physical behaviours and the states of the system directly from the text of a patent in an automatic way. ...
Acknowledgements The financial supports of RobLog Project (FP7 ICT-270350) and LILIT Project (PAR FAS REGIONE TOSCANA Linea di Azione 1. ...
doi:10.1016/j.aei.2013.04.004
fatcat:3rtyn7w5lrfk5jvreikzy54tja
Variations on language modeling for information retrieval
2005
SIGIR Forum
Variations on Language Modeling for Information Retrieval W. Kraaij -Enschede: Neslia Paniculata. Thesis Enschede -With ref. With summary ISBN 90-75296-09-6 ...
Since we did not have access to full morphological analysis for Italian, we used a simple, freely-distributed stemmer from the Open Muscat project. 2 For French and English, we lemmatized each word-form ...
(c) Optional (2): Exclude words that start with a different character. A second fine-tuning step constrains expansion terms to words that start with the same character. ...
doi:10.1145/1067268.1067291
fatcat:h23lp5aqfvfu5iecwnihfme244
Has Computational Linguistics Become More Applied?
[chapter]
2009
Lecture Notes in Computer Science
Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. ...
Future work aims to assign grammar rules and lexical entries in order to produce coherent texts that follow on from the generated text structures in several languages. Abstract. ...
natural language generators that aim to produce written text either for textual presentation or for eventual use by text-to-speech system. ...
doi:10.1007/978-3-642-00382-0_1
fatcat:oddvfzds4nfwjam2ccqeaxe2y4
Natural Language Processing - A Survey
[article]
2012
arXiv
pre-print
The utility and power of Natural Language Processing (NLP) seems destined to change our technological society in profound and fundamental ways. ...
However there are, to date, few accessible descriptions of the science of NLP that have been written for a popular audience, or even for an audience of intelligent, but uninitiated scientists. ...
In other words, "NL is essentially the language of human thought." Clearly, the conclusion to this debate will have significant ramifications for the future of NLP. ...
arXiv:1209.6238v1
fatcat:7ju5x3bguvhzlavukgx2sli4zy
« Previous
Showing results 1 — 15 out of 258 results