A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
2013
IOSR Journal of Computer Engineering
Word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying the appropriate sense of a word in a sentence, when the word has multiple meanings ...
In this paper, we discuss the methods of semi-supervised learning and their performance. ...
Yarowsky Bootstrapping Algorithm The Yarowsky algorithm (Yarowsky 1995) was, probably, one of the first and more successful applications of the bootstrapping approach to NLP tasks. ...
doi:10.9790/0661-1246368
fatcat:2ygxtozhq5attf4xoakyomyfra
Inferring Psycholinguistic Properties of Words
2016
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
We introduce a bootstrapping algorithm for regression that exploits word embedding models. ...
The approach achieves 0.88 correlation with humanproduced values and the inferred psycholinguistic features lead to state-of-the-art results when used in a Lexical Simplification task. ...
The updated version of the MRC resource can be downloaded from http://ghpaetzold.github.io/data/ BootstrappedMRC.zip. ...
doi:10.18653/v1/n16-1050
dblp:conf/naacl/PaetzoldS16
fatcat:pgduodkggzd2vedujo3eohnnm4
Introduction to the special issue on evaluating word sense disambiguation systems
2002
Natural Language Engineering
The evaluation of WSD has turned out to be as difficult as designing the systems in the first place. ...
Indeed, the success of any project in WSD is tied to the evaluation methodology used, and especially to the formalization of the task that the systems perform. ...
The reviewers were: Eneko Agirre ( We are very grateful to all those who responded to our query to the corpora mailing list in August 2002 regarding the existence and availability of senseannotated resources ...
doi:10.1017/s1351324902002966
fatcat:4vx2mnifobbbvjloud4lbua2tq
Senses and texts
[chapter]
1996
Benjamins Translation Library
That is to say, how to attach each occurrence of a word in a text to one and only one sense in a dictionary---a particular dictionary of course, and that is part of the problem. ...
humans, and secondly [Yarowsky 1995] which claims strikingly good results at doing exactly that. ...
The paper is also indebted to comments and criticisms from Adam Kilgarriff, David Yarowsky, Karen Sparck Jones, Rebecca Bruce and members of the CRL-New Mexico and University of Sheffield NLP groups. ...
doi:10.1075/btl.18.20wil
fatcat:zmop6puf4nectbyojcitd7e2ee
Noun sense induction using web search results
2005
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05
resources, and it does not require that the number of word senses be specified in advance. ...
Preliminary results on a small dataset show that this technique provides two advantages over other techniques in the literature: it detects real-world senses not found in dictionaries or other lexical ...
Yarowsky [5] uses a bootstrapping approach involving generalization from a small number of labeled instances. ...
doi:10.1145/1076034.1076176
dblp:conf/sigir/UdaniDDS05
fatcat:jdknbjhfezgmfmdxrdnf6pjgfy
A Transition-based Model for Joint Segmentation, POS-tagging and Normalization
2015
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Different from previous methods, the model can be trained on standard text corpora, overcoming the lack of annotated microblog corpora. ...
Experimental results show that our joint model can help improve the performance of word segmentation on microblogs, giving an error reduction in segmentation accuracy of 12.02%, compared to the traditional ...
Acknowledgments We thank all reviewers for the insightful comments. ...
doi:10.18653/v1/d15-1211
dblp:conf/emnlp/QianZZRJ15
fatcat:hezuh25mdferpdw3hrbgggb6da
Unsupervised Approach to Word Sense Disambiguation in Malayalam
2016
Procedia Technology - Elsevier
The aim of this work is to develop a WSD system for Malayalam, a language spoken in India, predominantly used in the state of Kerala. ...
Word Sense Disambiguation (WSD) is the task of identifying the correct sense of a word in a specific context when the word has multiple meaning. ...
We would like to gratefully acknowledge to all staff members in the department of Computer Science and Engineering, Government Engineering College, Palakkad, for their immense support. ...
doi:10.1016/j.protcy.2016.05.106
fatcat:xdmg3pgokzdnnnbymmnsl5g3c4
Unsupervised Models for Named Entity Classification
1999
Conference on Empirical Methods in Natural Language Processing
We present two algorithms. The first method uses a similar algorithm to that of (Yarowsky 95), with modifications motivated by (Blum and Mitchell 98). ...
The approach gains leverage from natural redundancy in the data: for many named-entity instances both the spelling of the name and the context inwhich it appears are sufficient to determine its type. ...
The Algorithm in (Yarowsky 95) We can now compare this algorithm to that of (Yarowsky 95) . The core of Yarowsky's algorithm is as follows: . . . ...
dblp:conf/emnlp/CollinsS99
fatcat:5nhyfv42yrg4za6okmoylriwx4
Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts
2007
Conference on Empirical Methods in Natural Language Processing
We explore the use of multiple alignment approaches and a bigram tagger to reduce the noise in the projected tags. ...
This leads to tagging accuracy in the low 80's on Biblical test material and in the 60's on other Middle English material. ...
We also seek to reduce the human effort involved in producing part-of-speech tags for historical corpora. ...
dblp:conf/emnlp/MoonB07
fatcat:3lubcumbozhajhopcjgsmybudm
A Methodology for Bilingual Lexicon Extraction from Comparable Corpora
2015
Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)
However, for many language pairs parallel corpora are a scarce resource which is why in the current work we discuss methods for dictionary extraction from comparable corpora. ...
Hereby the aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1) Eliminating the need for initial ...
I would like to thank Silvia Hansen-Schirra for her support of this work and valuable comments. ...
doi:10.18653/v1/w15-4108
dblp:conf/hytra/Rapp15
fatcat:ejunro4cmvadhnvr3wzd4ihwbm
Knowledge Sources for Word Sense Disambiguation
[chapter]
2001
Lecture Notes in Computer Science
We also compare the results for a wide range of algorithms that have been evaluated on a common test setting in our research group. ...
Two kinds of systems have been defined during the long history of WSD: principled systems that define which knowledge types are useful for WSD, and robust systems that use the information sources at hand ...
Acknowledgements Some of the algorithms have been jointly developed in cooperation with German Rigau. David Martinez has a scholarship from the Basque Country University. ...
doi:10.1007/3-540-44805-5_1
fatcat:dgvl3geyxfbo3npuemcw4vafaq
Identification and Disambiguation of Cognates, False Friends, and Partial Cognates Using Machine Learning Techniques
2010
International Journal of Linguistics
Partial cognates are pairs of words in two languages that have the same meaning in some but not all contexts. ...
Our approach of identifying cognates and false friends is based on several orthographic similarity measures that we use as features for machine learning classification algorithms. ...
Trying to develop the tool for other languages is also one of our future aims. In order to do this all we need is to plug in lists of cognates and false friends for the corresponding languages. ...
doi:10.5296/ijl.v1i1.309
fatcat:oast4j77qbd5vnorbzrigqbuma
Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining
2012
Knowledge and Information Systems
When these applications require to work in several languages, NERC becomes a bottleneck because its development requires language-specific tools and resources like lists of names or annotated corpora. ...
Keywords Named entity recognition and categorization · Information extraction · Multilingual natural language processing · Bootstrapping algorithms Introduction Nowadays there exists an increasing need ...
The following subsections describe the bootstrapping algorithm in more detail with the help of example figures tracking the process performed in an iteration. ...
doi:10.1007/s10115-012-0502-0
fatcat:ukfcua2ru5cfbicckj4hwgv5my
Semi-supervised Text Categorization by Considering Sufficiency and Diversity
[chapter]
2013
Communications in Computer and Information Science
After carefully considering the diversity preference, we modify the traditional bootstrapping algorithm by training the involved classifiers with random feature subspaces instead of the whole feature space ...
Experimental evaluation shows the effectiveness of our modified bootstrapping approach in both topic and sentiment-based TC tasks. ...
Bootstrapping algorithm with random subspace classifiers The size of the feature subset r is an important parameter in this algorithm. ...
doi:10.1007/978-3-642-41644-6_11
fatcat:ovcg5rk6inartgessrv6ccon7q
Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
2013
Annual Meeting of the Association for Computational Linguistics
Starting with a domain-independent, highprecision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the ...
We study subjective language in social media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams. ...
Then, for every term not in L B(i−1) that has a frequency ≥ θ f req , the probability of that term being subjective is calculated as shown in Algorithm 1 line 10. ...
dblp:conf/acl/VolkovaWY13
fatcat:lcszi45i6fa2zdfv6nthe6pode
« Previous
Showing results 1 — 15 out of 240 results