The Internet Archive has a preservation copy of this work in our general collections.
The file type is application/pdf
.
Filters
Corpus-Based Word Sense Disambiguation
[article]
1998
arXiv
pre-print
To resolve this problem, we propose a method to select a small number of effective examples, for system usage. ...
The lexical ambiguity of a word contained in the input is resolved by selecting the sense annotation of the retrieved example. ...
P (c) is estimated based on the distribution of words associated with synset c, obtained from a corpus. ...
arXiv:cmp-lg/9804004v1
fatcat:xwc4vd7ulbfwnbtebfdzu3wbae
Creation and Maintenance of Query Expansion Rules
[chapter]
2009
Lecture Notes in Business Information Processing
In an information retrieval system, a thesaurus can be used for query expansion, i.e. adding words to queries in order to improve recall. ...
Our semi-automatic approach to thesaurus creation constitutes a good compromise between fully manual approaches, which produce high-quality thesauri but at a prohibitively high cost, and fully automatic ...
without restrictions. ...
doi:10.1007/978-3-642-01347-8_68
fatcat:armxedhguzdz5k4lo5jev2zfte
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval
2005
Artificial Intelligence in Medicine
Methods for extracting bilingual lexicons from parallel and comparable corpora are described and their use in Multi-Language Information Access is illustrated. ...
Acknowledgements We wish to thank anonymous reviewers for useful comments on the first version of this paper. ...
We thus devised another method which makes use of the structure underlying a thesaurus and selects concept classes from the thesaurus in the following way: a) form the set F of the p best concept classes ...
doi:10.1016/j.artmed.2004.07.015
pmid:15811780
fatcat:2cmhd3lhhjaytlrw5gnypvwoby
A Probabilistic Approach to Lexical Semantic Knowledge Acquisition and S tructural Disambiguation
[article]
1998
arXiv
pre-print
In this thesis, I address the problem of automatically acquiring lexical semantic knowledge, especially that of case frame patterns, from large corpus data and using the acquired knowledge in structural ...
of a high-performance disambiguation method. ...
In this way, the problem of generalizing the values of a case slot turns out to be that of estimating a model from the class of tree cut models for some fixed thesaurus tree. ...
arXiv:cs/9812001v3
fatcat:mtjvcuk3jff3jeiakclq5uclw4
The Disambiguation of Nominalizations
2002
Computational Linguistics
This article addresses the interpretation of nominalizations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. ...
We combine these distinct information sources using Ripper, a system that learns sets of rules from data, and achieve an accuracy of 86.1% (over a baseline of 61.5%) on the British National ...
using a corpus in conjunction with Roget's thesaurus. ...
doi:10.1162/089120102760276018
fatcat:ryhm2wztlbe53hecebptlvtepm
Thesaurus based automatic keyphrase indexing
2006
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries - JCDL '06
We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus. ...
Keyphrases are widely used in both physical and digital libraries as a brief but precise summary of documents. ...
The first row shows the performance of KEA when keyphrases are extracted from abstracts without a controlled vocabulary. ...
doi:10.1145/1141753.1141819
dblp:conf/jcdl/MedelyanW06a
fatcat:3kgsw6yeyrgxffuifl4wwc3qnq
Analysis of Text Collections for the Purposes of Keyword Extraction Task
2020
Journal of Information and Organizational Sciences
We take in to consideration a number of characteristics, such as the text length distribution in words and the method of keyword assignment. ...
Moreover, most of the article lengths range between 400 and 2500 words. Additionally, the paper presents a brief review of eleven corpora that have been used to evaluate AKEA's. ...
A feature of the Hulth-2003 assignment is the presence of two sets of KW -a set of controlled, i.e. terms restricted to the Inspec thesaurus, and a set of uncontrolled terms that can be any terms. ...
doi:10.31341/jios.44.1.8
fatcat:i3k5ec2gizbhtfslpjfnxe52ha
Imparting Interpretability to Word Embeddings while Preserving Semantic Structure
[article]
2020
arXiv
pre-print
We quantify the extent of interpretability and assignment of meaning from our experimental results. ...
The predefined concepts are derived from an external lexical resource, which in this paper is chosen as Roget's Thesaurus. ...
a corpus. ...
arXiv:1807.07279v3
fatcat:r4lf34zjajdidhiqbi274q46w4
Word Sense Disambiguation using Conceptual Density
[article]
1996
arXiv
pre-print
This paper presents a method for the resolution of lexical ambiguity of nouns and its automatic evaluation over the Brown Corpus. ...
The results of the experiments have been automatically evaluated against SemCor, the sense-tagged version of the Brown Corpus. ...
Rodríguez and Alicia Ageno from the Computer Science Department of UPC. ...
arXiv:cmp-lg/9606007v1
fatcat:4agvqv6p5jfw7p7qkmcg3ci6qu
Designing Statistical Language Learners: Experiments on Noun Compounds
[article]
1996
arXiv
pre-print
These results suggest that the new class of designs identified is a promising one. The experiments also serve to highlight the need for a widely applicable theory of data requirements. ...
In pursuit of that goal, the thesis makes two main theoretical contributions: (i) it identifies a new class of designs by specifying an architecture for natural language analysis in which probabilities ...
The process of computing these estimates from a corpus is called training. ...
arXiv:cmp-lg/9609008v1
fatcat:yipwhts6ybhknmwimhbqrhchv4
Text Relatedness Based on a Word Thesaurus
2010
The Journal of Artificial Intelligence Research
Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. ...
Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based ...
Acknowledgments Part of this work was done while George Tsatsaronis was at the Department of Informatics of Athens University of Economics and Business. ...
doi:10.1613/jair.2880
fatcat:6e7gthhwg5elxmr7ne66p7qoee
Bibliographic database access using free-text and controlled vocabulary: an evaluation
2005
Information Processing & Management
Retrieval is from a relatively large collection of bibliographic material written in French. ...
Third, the evaluations presented in this article study reveal the comparative retrieval performances that result from manual and automatic indexing in a variety of circumstances. ...
These assigned descriptors are occurrences or variants of terms extracted from the INIST thesaurus. ...
doi:10.1016/j.ipm.2004.01.004
fatcat:5g22bzg7nndbrc4k2ameslvfyq
Creating and exploiting a comparable corpus in cross-language information retrieval
2007
ACM Transactions on Information Systems
We present a method for creating a comparable text corpus from two document collections in different languages. ...
The collections can be very different in origin: in this study we build a comparable corpus from articles by a Swedish news agency and a U.S. newspaper. ...
All in all, it could be estimated that one day's work by a single assessor would be enough to decide the threshold levels used for creating a comparable corpus. ...
doi:10.1145/1198296.1198300
fatcat:ajomowrcl5agphti32eltqcmfy
Relation Extraction for Open and Closed Domain Question Answering
[chapter]
2011
Interactive Multi-modal Question-Answering
The first (lightly supervised) method starts from a seed list of argument instances, and extracts dependency paths from all sentences in which a seed pair occurs. ...
It requires automatic extraction from text of all relation instances for relations that users frequently ask for. ...
Extraction patterns are selected by estimating the precision of each dependency path, and preserving only those paths that are above a given threshold. ...
doi:10.1007/978-3-642-17525-1_8
dblp:series/tanlp/BoumaFM11
fatcat:jkoly4jsljaxfgfky3xs547j3i
A Multidisciplinary Approach to Unlocking Television Broadcast Archives
2009
Interdisciplinary Science Reviews
We have enriched the Sound and Vision thesaurus that is used to annotate the TV programmes in order to provide a user with a wider range of search results. ...
Audiovisual material is a vital component of the world's heritage but it remains diffi cult to access. ...
from a thesaurus to it. ...
doi:10.1179/174327909x441144
fatcat:up5zv5uqtrdlrl2dcfykokxode
« Previous
Showing results 1 — 15 out of 566 results