378 Hits in 10.6 sec

Design and Structure of The Juman++ Morphological Analyzer Toolkit

Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi
2020 Journal of Natural Language Processing  
During the analysis, it looks up word candidates using a dictionary and unknown word handlers, and then builds a lattice by connecting adjacent word candidates.  ...  Juman++ follows the dictionary-based morphological analyzer design.  ...  The section on the morphological analysis is based on the paper presented at the meeting of North American Chapter of the Association for Computational Linguistics (Tolmachev et al. 2019 ).  ... 
doi:10.5715/jnlp.27.89 fatcat:i7r2s6r4sbagvplw2d4ztj4vsi

Entry Generation by Analogy – Encoding New Words for Morphological Lexicons

Krister Lindén
2009 Northern European Journal of Language Technology  
In this article, we evaluate a combination of corpus-based and lexicon-based methods for assigning the base form and inflectional paradigm to new words in Finnish, Swedish and English finite-state transducer  ...  By combining the probabilities calculated from corpus data and from lexical data, we get a more precise combined model.  ...  I am also grateful to Tommi Pirinen, Jussi Tuovila and Anssi Yli-Jyrä for many fruitful discussions as well as to Lars Borin, Yves Lepage and Heiki-Jaan Kaalep for valuable comments on the manuscript.  ... 
doi:10.3384/nejlt.2000-1533.09111 fatcat:vfv4b47dfjbd5aczmnc2xk6bai

A survey of named entity recognition and classification [chapter]

David Nadeau, Satoshi Sekine
2009 Benjamins Current Topics  
Turney for helpful comments.  ...  Early commercial initiatives are already modifying the way we use yellow pages by providing local search engines (search your neighborhood for organizations, product and services, people, etc.).  ...  General dictionary Common nouns listed in a dictionary are useful, for instance, in the disambiguation of capitalized words in ambiguous positions (e.g., sentence beginning). A.  ... 
doi:10.1075/bct.19.03nad fatcat:cahdvln4mrf5rlqmhhx2jett7y

Variations on language modeling for information retrieval

Wessel Kraaij
2005 SIGIR Forum  
Variations on Language Modeling for Information Retrieval W. Kraaij -Enschede: Neslia Paniculata. Thesis Enschede -With ref. With summary ISBN 90-75296-09-6  ...  Determine candidate page URLs: (For each web site) Query a Web search engine for all Web pages on a particular site.  ...  This fact can be exploited for automatic dictionary construction by an algorithm, which compares the contexts of unknown words (Fung, 2000) .  ... 
doi:10.1145/1067268.1067291 fatcat:h23lp5aqfvfu5iecwnihfme244

Introduction to information retrieval

2009 ChoiceReviews  
The first implementation of a connectivity server was described by .  ...  The scheme discussed in this chapter, currently believed to be the best published scheme (achieving as few as 3 bits per link for encoding), is described in a series of papers by Boldi and Vigna (2004b  ...  Web search engines therefore use distributed indexing algorithms for index construction.  ... 
doi:10.5860/choice.46-2715 fatcat:ruwoe46pgzcupjygnwbnit4z3u

Взiaлъ, възялъ, вьзял: Processing Orthographic Variation in Lexico-Grammatical Annotation of the Middle Russian Corpus of 15th–17th Centuries
Взiaлъ, възялъ, вьзял: обработка орфографической вариативности при лексико-грамматической аннотации Старорусского корпуса XV–XVII вв

Tatiana Gavrilova, Tatiana Shalganova, Ol'ga Liashevskaia
2017 Vestnik Pravoslavnogo Svâto-Tihonovskogo Gumanitarnogo Universiteta: Seriâ III. Filologiâ  
A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine. Proceedings of MLMTA, Las Vegas, Nevada, 2003. P. 273-280.  ...  The task of lexico-grammatical analysis is to assign a dictionary form (lemma), a part of speech indication and grammatical tags to each word form in the corpus.  ... 
doi:10.15382/sturiii201751.11-20 fatcat:fbogzczngndg3ppqrdtk4n7zsq

Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics [article]

Preslav Nakov
2019 arXiv   pre-print
I address the problem of noun compounds syntax by means of novel, highly accurate unsupervised and lightly supervised algorithms using the Web as a corpus and search engines as interfaces to that corpus  ...  Traditionally the Web has been viewed as a source of page hit counts, used as an estimate for n-gram word frequencies.  ...  I focus on the first three problems, which I address with novel, highly accurate algorithms using the Web as a corpus and search engines as interfaces to that corpus.  ... 
arXiv:1912.01113v1 fatcat:3ubstjtd7nhphmw4kbewls3onq

Proceedings of the BioCreative V.5 Challenge Evaluation Workshop

Martin Krallinger, Alfonso Valencia
2022 Zenodo  
The Spanish National Bioinformatics Institute (INB) unit at the Spanish National Cancer Research Centre (CNIO) is a member of the INB, PRB2-ISCIII and is supported by grant PT13/0001/0030, of the PE I+  ...  D+i 2013-2016, funded by ISCIII and ERDF.  ...  In future work, a hyperparameter search algorithm, which is less time-consuming than grid search, will be implemented, for instance, random search or Bayesian optimization.  ... 
doi:10.5281/zenodo.6519885 fatcat:gzzr6ogkwvfe3eglv6anrzt5s4

Annotating Protein Function through Lexical Analysis

Rajesh Nair, Burkhard Rost
2004 The AI Magazine  
result from for a protein U of unknown function, search homology transfer (Devos and Valencia 2001; the database for proteins {K} that have a se- Koonin 2000; Valencia 2002).  ...  The development of a stan- cia 1998). Words with a high z-score are likely to dardized ontology is an important step in this be potential keywords for the family. direction.  ... 
doi:10.1609/aimag.v25i1.1746 dblp:journals/aim/NairR04 fatcat:eoinvyqphjg4zmk2jvyoiukpfa

Brain–Computer Interfaces for Human Augmentation

Davide Valeriani, Caterina Cinel, Riccardo Poli
2019 Brain Sciences  
control signals for external devices for people with severe disabilities [...]  ...  The field of brain–computer interfaces (BCIs) has grown rapidly in the last few decades, allowing the development of ever faster and more reliable assistive technologies for converting brain activity into  ...  Acknowledgments: We would like to thank to Valentina Unakafova for providing the Permutation Entropy algorithm and to Montserrat-Alvarado González for providing the source code and a detailed description  ... 
doi:10.3390/brainsci9020022 pmid:30682766 pmcid:PMC6406539 fatcat:4kjekrytqrcd7h4egrqlxsdbpq

Lexikos 17

Lexikos Lexikos
2012 Lexikos  
The T-1 feature describes a single tag, while T+1 encodes all possible tags for the W+1 word. This is a consequence of the left-to-right tagging process.  ...  A special word of thanks goes to Malcolm MacLeod who wrote most of the code that drives TshwaneLex's corpus tool.  ...  Item (5) is the search method used by the computer (the search engine).  ... 
doi:10.5788/17-0-1194 fatcat:kdt4k5l5ozc2pjq3els3sgemgu

Effective focused retrieval by exploiting query context and document structure

Rianne Kaptein
2012 SIGIR Forum  
Let's try to find a particular piece of information using a Web search engine.  ...  need into a query, which can be easily processed by the search engine.  ...  Web collectie.  ... 
doi:10.1145/2093346.2093366 fatcat:f6n2nplok5f7pd3ijcrghrkg2y

Adaptive information extraction

Jordi Turmo, Alicia Ageno, Neus Català
2006 ACM Computing Surveys  
For the sake of reducing the high cost of manually adapting IE applications to new domains, experiments with different Machine Learning (ML) techniques have been carried out by the research community.  ...  Some examples of these applications are the generation of data bases from documents, as well as the acquisition of knowledge useful for emerging technologies like question answering, information integration  ...  written), lexical analyzers (including morphological analysis and NE recognition and classification), engines dealing with unknown words, disambiguators (POS taggers, semantic taggers, etc.), stemmers  ... 
doi:10.1145/1132956.1132957 fatcat:usu2uodawzf5hh2ibl7ufo2y6y

Machine Learning for E-mail Spam Filtering: Review,Techniques and Trends [article]

Alexy Bhowmick, Shyamanta M. Hazarika
2016 arXiv   pre-print
We conclude by measuring the impact of Machine Learning-based filters and explore the promising offshoots of latest developments.  ...  We present a comprehensive review of the most effective content-based e-mail spam filtering techniques.  ...  In general, spam has many forms -chat rooms are subject to chat spam, blogs are subject to blog spam (splogs) [Kolari et al, 2006] , search engines are often misled by web spam (search engine spamming  ... 
arXiv:1606.01042v1 fatcat:cblnuc4knfhehjwzjeeekbgf3m

Biomedical signal compression with time- and subject-adaptive dictionary for wearable devices

Valentina Vadori, Enrico Grisan, Michele Rossi
2016 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)  
Biometric signals compression with time-and subject-adaptive dictionary for wearable devices by Valentina Vadori Wearable devices are a leading category in the Internet of Things.  ...  In this work, I am concerned with the design of a lossy compression technique for the real-time processing of biomedical signals.  ...  Acknowledgements I would like to thank my supervisor Michele Rossi for his authoritative guidance through the course of these months and Roberto Francescon, Matteo Gadaleta and Mohsen Hooshmand for their  ... 
doi:10.1109/mlsp.2016.7738820 dblp:conf/mlsp/VadoriGR16 fatcat:o456gowukvd27kl5xuoqbzwppq
« Previous Showing results 1 — 15 out of 378 results