566 Hits in 6.5 sec

Corpus-Based Word Sense Disambiguation [article]

Atsushi Fujii
1998 arXiv   pre-print
To resolve this problem, we propose a method to select a small number of effective examples, for system usage.  ...  The lexical ambiguity of a word contained in the input is resolved by selecting the sense annotation of the retrieved example.  ...  P (c) is estimated based on the distribution of words associated with synset c, obtained from a corpus.  ... 
arXiv:cmp-lg/9804004v1 fatcat:xwc4vd7ulbfwnbtebfdzu3wbae

Creation and Maintenance of Query Expansion Rules [chapter]

Stefania Castellani, Aaron Kaplan, Frédéric Roulland, Jutta Willamowski, Antonietta Grasso
2009 Lecture Notes in Business Information Processing  
In an information retrieval system, a thesaurus can be used for query expansion, i.e. adding words to queries in order to improve recall.  ...  Our semi-automatic approach to thesaurus creation constitutes a good compromise between fully manual approaches, which produce high-quality thesauri but at a prohibitively high cost, and fully automatic  ...  without restrictions.  ... 
doi:10.1007/978-3-642-01347-8_68 fatcat:armxedhguzdz5k4lo5jev2zfte

Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

H. Déjean, E. Gaussier, J.-M. Renders, F. Sadat
2005 Artificial Intelligence in Medicine  
Methods for extracting bilingual lexicons from parallel and comparable corpora are described and their use in Multi-Language Information Access is illustrated.  ...  Acknowledgements We wish to thank anonymous reviewers for useful comments on the first version of this paper.  ...  We thus devised another method which makes use of the structure underlying a thesaurus and selects concept classes from the thesaurus in the following way: a) form the set F of the p best concept classes  ... 
doi:10.1016/j.artmed.2004.07.015 pmid:15811780 fatcat:2cmhd3lhhjaytlrw5gnypvwoby

A Probabilistic Approach to Lexical Semantic Knowledge Acquisition and S tructural Disambiguation [article]

Hang LI
1998 arXiv   pre-print
In this thesis, I address the problem of automatically acquiring lexical semantic knowledge, especially that of case frame patterns, from large corpus data and using the acquired knowledge in structural  ...  of a high-performance disambiguation method.  ...  In this way, the problem of generalizing the values of a case slot turns out to be that of estimating a model from the class of tree cut models for some fixed thesaurus tree.  ... 
arXiv:cs/9812001v3 fatcat:mtjvcuk3jff3jeiakclq5uclw4

The Disambiguation of Nominalizations

Maria Lapata
2002 Computational Linguistics  
This article addresses the interpretation of nominalizations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb.  ...  We combine these distinct information sources using Ripper, a system that learns sets of rules from data, and achieve an accuracy of 86.1% (over a baseline of 61.5%) on the British National  ...  using a corpus in conjunction with Roget's thesaurus.  ... 
doi:10.1162/089120102760276018 fatcat:ryhm2wztlbe53hecebptlvtepm

Thesaurus based automatic keyphrase indexing

Olena Medelyan, Ian H. Witten
2006 Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries - JCDL '06  
We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus.  ...  Keyphrases are widely used in both physical and digital libraries as a brief but precise summary of documents.  ...  The first row shows the performance of KEA when keyphrases are extracted from abstracts without a controlled vocabulary.  ... 
doi:10.1145/1141753.1141819 dblp:conf/jcdl/MedelyanW06a fatcat:3kgsw6yeyrgxffuifl4wwc3qnq

Analysis of Text Collections for the Purposes of Keyword Extraction Task

Alexander Vanyushkin, Leonid Graschenko
2020 Journal of Information and Organizational Sciences  
We take in to consideration a number of characteristics, such as the text length distribution in words and the method of keyword assignment.  ...  Moreover, most of the article lengths range between 400 and 2500 words. Additionally, the paper presents a brief review of eleven corpora that have been used to evaluate AKEA's.  ...  A feature of the Hulth-2003 assignment is the presence of two sets of KW -a set of controlled, i.e. terms restricted to the Inspec thesaurus, and a set of uncontrolled terms that can be any terms.  ... 
doi:10.31341/jios.44.1.8 fatcat:i3k5ec2gizbhtfslpjfnxe52ha

Imparting Interpretability to Word Embeddings while Preserving Semantic Structure [article]

Lutfi Kerem Senel, Ihsan Utlu, Furkan Şahinuç, Haldun M. Ozaktas, Aykut Koç
2020 arXiv   pre-print
We quantify the extent of interpretability and assignment of meaning from our experimental results.  ...  The predefined concepts are derived from an external lexical resource, which in this paper is chosen as Roget's Thesaurus.  ...  a corpus.  ... 
arXiv:1807.07279v3 fatcat:r4lf34zjajdidhiqbi274q46w4

Word Sense Disambiguation using Conceptual Density [article]

Eneko Agirre, German Rigau
1996 arXiv   pre-print
This paper presents a method for the resolution of lexical ambiguity of nouns and its automatic evaluation over the Brown Corpus.  ...  The results of the experiments have been automatically evaluated against SemCor, the sense-tagged version of the Brown Corpus.  ...  Rodríguez and Alicia Ageno from the Computer Science Department of UPC.  ... 
arXiv:cmp-lg/9606007v1 fatcat:4agvqv6p5jfw7p7qkmcg3ci6qu

Designing Statistical Language Learners: Experiments on Noun Compounds [article]

Mark Lauer
1996 arXiv   pre-print
These results suggest that the new class of designs identified is a promising one. The experiments also serve to highlight the need for a widely applicable theory of data requirements.  ...  In pursuit of that goal, the thesis makes two main theoretical contributions: (i) it identifies a new class of designs by specifying an architecture for natural language analysis in which probabilities  ...  The process of computing these estimates from a corpus is called training.  ... 
arXiv:cmp-lg/9609008v1 fatcat:yipwhts6ybhknmwimhbqrhchv4

Text Relatedness Based on a Word Thesaurus

G. Tsatsaronis, I. Varlamis, M. Vazirgiannis
2010 The Journal of Artificial Intelligence Research  
Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words.  ...  Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based  ...  Acknowledgments Part of this work was done while George Tsatsaronis was at the Department of Informatics of Athens University of Economics and Business.  ... 
doi:10.1613/jair.2880 fatcat:6e7gthhwg5elxmr7ne66p7qoee

Bibliographic database access using free-text and controlled vocabulary: an evaluation

Jacques Savoy
2005 Information Processing & Management  
Retrieval is from a relatively large collection of bibliographic material written in French.  ...  Third, the evaluations presented in this article study reveal the comparative retrieval performances that result from manual and automatic indexing in a variety of circumstances.  ...  These assigned descriptors are occurrences or variants of terms extracted from the INIST thesaurus.  ... 
doi:10.1016/j.ipm.2004.01.004 fatcat:5g22bzg7nndbrc4k2ameslvfyq

Creating and exploiting a comparable corpus in cross-language information retrieval

Tuomas Talvensaari, Jorma Laurikkala, Kalervo Järvelin, Martti Juhola, Heikki Keskustalo
2007 ACM Transactions on Information Systems  
We present a method for creating a comparable text corpus from two document collections in different languages.  ...  The collections can be very different in origin: in this study we build a comparable corpus from articles by a Swedish news agency and a U.S. newspaper.  ...  All in all, it could be estimated that one day's work by a single assessor would be enough to decide the threshold levels used for creating a comparable corpus.  ... 
doi:10.1145/1198296.1198300 fatcat:ajomowrcl5agphti32eltqcmfy

Relation Extraction for Open and Closed Domain Question Answering [chapter]

Gosse Bouma, Ismail Fahmi, Jori Mur
2011 Interactive Multi-modal Question-Answering  
The first (lightly supervised) method starts from a seed list of argument instances, and extracts dependency paths from all sentences in which a seed pair occurs.  ...  It requires automatic extraction from text of all relation instances for relations that users frequently ask for.  ...  Extraction patterns are selected by estimating the precision of each dependency path, and preserving only those paths that are above a given threshold.  ... 
doi:10.1007/978-3-642-17525-1_8 dblp:series/tanlp/BoumaFM11 fatcat:jkoly4jsljaxfgfky3xs547j3i

A Multidisciplinary Approach to Unlocking Television Broadcast Archives

Laura Hollink, Guus Schreiber, Bouke Huurnink, Michiel van Liempt, Maarten de Rijke, Arnold Smeulders, Johan Oomen, Annemieke de Jong
2009 Interdisciplinary Science Reviews  
We have enriched the Sound and Vision thesaurus that is used to annotate the TV programmes in order to provide a user with a wider range of search results.  ...  Audiovisual material is a vital component of the world's heritage but it remains diffi cult to access.  ...  from a thesaurus to it.  ... 
doi:10.1179/174327909x441144 fatcat:up5zv5uqtrdlrl2dcfykokxode
« Previous Showing results 1 — 15 out of 566 results