A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Design and Structure of The Juman++ Morphological Analyzer Toolkit
2020
Journal of Natural Language Processing
During the analysis, it looks up word candidates using a dictionary and unknown word handlers, and then builds a lattice by connecting adjacent word candidates. ...
Juman++ follows the dictionary-based morphological analyzer design. ...
The section on the morphological analysis is based on the paper presented at the meeting of North American Chapter of the Association for Computational Linguistics (Tolmachev et al. 2019 ). ...
doi:10.5715/jnlp.27.89
fatcat:i7r2s6r4sbagvplw2d4ztj4vsi
Entry Generation by Analogy – Encoding New Words for Morphological Lexicons
2009
Northern European Journal of Language Technology
In this article, we evaluate a combination of corpus-based and lexicon-based methods for assigning the base form and inflectional paradigm to new words in Finnish, Swedish and English finite-state transducer ...
By combining the probabilities calculated from corpus data and from lexical data, we get a more precise combined model. ...
I am also grateful to Tommi Pirinen, Jussi Tuovila and Anssi Yli-Jyrä for many fruitful discussions as well as to Lars Borin, Yves Lepage and Heiki-Jaan Kaalep for valuable comments on the manuscript. ...
doi:10.3384/nejlt.2000-1533.09111
fatcat:vfv4b47dfjbd5aczmnc2xk6bai
A survey of named entity recognition and classification
[chapter]
2009
Benjamins Current Topics
Turney for helpful comments. ...
Early commercial initiatives are already modifying the way we use yellow pages by providing local search engines (search your neighborhood for organizations, product and services, people, etc.). ...
General dictionary Common nouns listed in a dictionary are useful, for instance, in the disambiguation of capitalized words in ambiguous positions (e.g., sentence beginning). A. ...
doi:10.1075/bct.19.03nad
fatcat:cahdvln4mrf5rlqmhhx2jett7y
Variations on language modeling for information retrieval
2005
SIGIR Forum
Variations on Language Modeling for Information Retrieval W. Kraaij -Enschede: Neslia Paniculata. Thesis Enschede -With ref. With summary ISBN 90-75296-09-6 ...
Determine candidate page URLs: (For each web site) Query a Web search engine for all Web pages on a particular site. ...
This fact can be exploited for automatic dictionary construction by an algorithm, which compares the contexts of unknown words (Fung, 2000) . ...
doi:10.1145/1067268.1067291
fatcat:h23lp5aqfvfu5iecwnihfme244
Introduction to information retrieval
2009
ChoiceReviews
The first implementation of a connectivity server was described by . ...
The scheme discussed in this chapter, currently believed to be the best published scheme (achieving as few as 3 bits per link for encoding), is described in a series of papers by Boldi and Vigna (2004b ...
Web search engines therefore use distributed indexing algorithms for index construction. ...
doi:10.5860/choice.46-2715
fatcat:ruwoe46pgzcupjygnwbnit4z3u
Взiaлъ, възялъ, вьзял: Processing Orthographic Variation in Lexico-Grammatical Annotation of the Middle Russian Corpus of 15th–17th Centuries
Взiaлъ, възялъ, вьзял: обработка орфографической вариативности при лексико-грамматической аннотации Старорусского корпуса XV–XVII вв
2017
Vestnik Pravoslavnogo Svâto-Tihonovskogo Gumanitarnogo Universiteta: Seriâ III. Filologiâ
Взiaлъ, възялъ, вьзял: обработка орфографической вариативности при лексико-грамматической аннотации Старорусского корпуса XV–XVII вв
A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine. Proceedings of MLMTA, Las Vegas, Nevada, 2003. P. 273-280. ...
The task of lexico-grammatical analysis is to assign a dictionary form (lemma), a part of speech indication and grammatical tags to each word form in the corpus. ...
doi:10.15382/sturiii201751.11-20
fatcat:fbogzczngndg3ppqrdtk4n7zsq
Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics
[article]
2019
arXiv
pre-print
I address the problem of noun compounds syntax by means of novel, highly accurate unsupervised and lightly supervised algorithms using the Web as a corpus and search engines as interfaces to that corpus ...
Traditionally the Web has been viewed as a source of page hit counts, used as an estimate for n-gram word frequencies. ...
I focus on the first three problems, which I address with novel, highly accurate algorithms using the Web as a corpus and search engines as interfaces to that corpus. ...
arXiv:1912.01113v1
fatcat:3ubstjtd7nhphmw4kbewls3onq
Proceedings of the BioCreative V.5 Challenge Evaluation Workshop
2022
Zenodo
The Spanish National Bioinformatics Institute (INB) unit at the Spanish National Cancer Research Centre (CNIO) is a member of the INB, PRB2-ISCIII and is supported by grant PT13/0001/0030, of the PE I+ ...
D+i 2013-2016, funded by ISCIII and ERDF. ...
In future work, a hyperparameter search algorithm, which is less time-consuming than grid search, will be implemented, for instance, random search or Bayesian optimization. ...
doi:10.5281/zenodo.6519885
fatcat:gzzr6ogkwvfe3eglv6anrzt5s4
Annotating Protein Function through Lexical Analysis
2004
The AI Magazine
result from
for a protein U of unknown function, search homology transfer (Devos and Valencia 2001;
the database for proteins {K} that have a se- Koonin 2000; Valencia 2002). ...
The development of a stan-
cia 1998). Words with a high z-score are likely to dardized ontology is an important step in this
be potential keywords for the family. direction. ...
doi:10.1609/aimag.v25i1.1746
dblp:journals/aim/NairR04
fatcat:eoinvyqphjg4zmk2jvyoiukpfa
Brain–Computer Interfaces for Human Augmentation
2019
Brain Sciences
control signals for external devices for people with severe disabilities [...] ...
The field of brain–computer interfaces (BCIs) has grown rapidly in the last few decades, allowing the development of ever faster and more reliable assistive technologies for converting brain activity into ...
Acknowledgments: We would like to thank to Valentina Unakafova for providing the Permutation Entropy algorithm and to Montserrat-Alvarado González for providing the source code and a detailed description ...
doi:10.3390/brainsci9020022
pmid:30682766
pmcid:PMC6406539
fatcat:4kjekrytqrcd7h4egrqlxsdbpq
Lexikos 17
2012
Lexikos
The T-1 feature describes a single tag, while T+1 encodes all possible tags for the W+1 word. This is a consequence of the left-to-right tagging process. ...
A special word of thanks goes to Malcolm MacLeod who wrote most of the code that drives TshwaneLex's corpus tool. ...
Item (5) is the search method used by the computer (the search engine). ...
doi:10.5788/17-0-1194
fatcat:kdt4k5l5ozc2pjq3els3sgemgu
Effective focused retrieval by exploiting query context and document structure
2012
SIGIR Forum
Let's try to find a particular piece of information using a Web search engine. ...
need into a query, which can be easily processed by the search engine. ...
Web collectie. ...
doi:10.1145/2093346.2093366
fatcat:f6n2nplok5f7pd3ijcrghrkg2y
Adaptive information extraction
2006
ACM Computing Surveys
For the sake of reducing the high cost of manually adapting IE applications to new domains, experiments with different Machine Learning (ML) techniques have been carried out by the research community. ...
Some examples of these applications are the generation of data bases from documents, as well as the acquisition of knowledge useful for emerging technologies like question answering, information integration ...
written), lexical analyzers (including morphological analysis and NE recognition and classification), engines dealing with unknown words, disambiguators (POS taggers, semantic taggers, etc.), stemmers ...
doi:10.1145/1132956.1132957
fatcat:usu2uodawzf5hh2ibl7ufo2y6y
Machine Learning for E-mail Spam Filtering: Review,Techniques and Trends
[article]
2016
arXiv
pre-print
We conclude by measuring the impact of Machine Learning-based filters and explore the promising offshoots of latest developments. ...
We present a comprehensive review of the most effective content-based e-mail spam filtering techniques. ...
In general, spam has many forms -chat rooms are subject to chat spam, blogs are subject to blog spam (splogs) [Kolari et al, 2006] , search engines are often misled by web spam (search engine spamming ...
arXiv:1606.01042v1
fatcat:cblnuc4knfhehjwzjeeekbgf3m
Biomedical signal compression with time- and subject-adaptive dictionary for wearable devices
2016
2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)
Biometric signals compression with time-and subject-adaptive dictionary for wearable devices by Valentina Vadori Wearable devices are a leading category in the Internet of Things. ...
In this work, I am concerned with the design of a lossy compression technique for the real-time processing of biomedical signals. ...
Acknowledgements I would like to thank my supervisor Michele Rossi for his authoritative guidance through the course of these months and Roberto Francescon, Matteo Gadaleta and Mohsen Hooshmand for their ...
doi:10.1109/mlsp.2016.7738820
dblp:conf/mlsp/VadoriGR16
fatcat:o456gowukvd27kl5xuoqbzwppq
« Previous
Showing results 1 — 15 out of 378 results