240 Hits in 3.9 sec

Review: Semi-Supervised Learning Methods for Word Sense Disambiguation

Ms. Ankita Sati
2013 IOSR Journal of Computer Engineering  
Word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying the appropriate sense of a word in a sentence, when the word has multiple meanings  ...  In this paper, we discuss the methods of semi-supervised learning and their performance.  ...  Yarowsky Bootstrapping Algorithm The Yarowsky algorithm (Yarowsky 1995) was, probably, one of the first and more successful applications of the bootstrapping approach to NLP tasks.  ... 
doi:10.9790/0661-1246368 fatcat:2ygxtozhq5attf4xoakyomyfra

Inferring Psycholinguistic Properties of Words

Gustavo Paetzold, Lucia Specia
2016 Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
We introduce a bootstrapping algorithm for regression that exploits word embedding models.  ...  The approach achieves 0.88 correlation with humanproduced values and the inferred psycholinguistic features lead to state-of-the-art results when used in a Lexical Simplification task.  ...  The updated version of the MRC resource can be downloaded from  ... 
doi:10.18653/v1/n16-1050 dblp:conf/naacl/PaetzoldS16 fatcat:pgduodkggzd2vedujo3eohnnm4

Introduction to the special issue on evaluating word sense disambiguation systems

2002 Natural Language Engineering  
The evaluation of WSD has turned out to be as difficult as designing the systems in the first place.  ...  Indeed, the success of any project in WSD is tied to the evaluation methodology used, and especially to the formalization of the task that the systems perform.  ...  The reviewers were: Eneko Agirre ( We are very grateful to all those who responded to our query to the corpora mailing list in August 2002 regarding the existence and availability of senseannotated resources  ... 
doi:10.1017/s1351324902002966 fatcat:4vx2mnifobbbvjloud4lbua2tq

Senses and texts [chapter]

Yorick Wilks
1996 Benjamins Translation Library  
That is to say, how to attach each occurrence of a word in a text to one and only one sense in a dictionary---a particular dictionary of course, and that is part of the problem.  ...  humans, and secondly [Yarowsky 1995] which claims strikingly good results at doing exactly that.  ...  The paper is also indebted to comments and criticisms from Adam Kilgarriff, David Yarowsky, Karen Sparck Jones, Rebecca Bruce and members of the CRL-New Mexico and University of Sheffield NLP groups.  ... 
doi:10.1075/btl.18.20wil fatcat:zmop6puf4nectbyojcitd7e2ee

Noun sense induction using web search results

Goldee Udani, Shachi Dave, Anthony Davis, Tim Sibley
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
resources, and it does not require that the number of word senses be specified in advance.  ...  Preliminary results on a small dataset show that this technique provides two advantages over other techniques in the literature: it detects real-world senses not found in dictionaries or other lexical  ...  Yarowsky [5] uses a bootstrapping approach involving generalization from a small number of labeled instances.  ... 
doi:10.1145/1076034.1076176 dblp:conf/sigir/UdaniDDS05 fatcat:jdknbjhfezgmfmdxrdnf6pjgfy

A Transition-based Model for Joint Segmentation, POS-tagging and Normalization

Tao Qian, Yue Zhang, Meishan Zhang, Yafeng Ren, Donghong Ji
2015 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing  
Different from previous methods, the model can be trained on standard text corpora, overcoming the lack of annotated microblog corpora.  ...  Experimental results show that our joint model can help improve the performance of word segmentation on microblogs, giving an error reduction in segmentation accuracy of 12.02%, compared to the traditional  ...  Acknowledgments We thank all reviewers for the insightful comments.  ... 
doi:10.18653/v1/d15-1211 dblp:conf/emnlp/QianZZRJ15 fatcat:hezuh25mdferpdw3hrbgggb6da

Unsupervised Approach to Word Sense Disambiguation in Malayalam

K.P. Sruthi Sankar, P.C. Reghu Raj, V. Jayan
2016 Procedia Technology - Elsevier  
The aim of this work is to develop a WSD system for Malayalam, a language spoken in India, predominantly used in the state of Kerala.  ...  Word Sense Disambiguation (WSD) is the task of identifying the correct sense of a word in a specific context when the word has multiple meaning.  ...  We would like to gratefully acknowledge to all staff members in the department of Computer Science and Engineering, Government Engineering College, Palakkad, for their immense support.  ... 
doi:10.1016/j.protcy.2016.05.106 fatcat:xdmg3pgokzdnnnbymmnsl5g3c4

Unsupervised Models for Named Entity Classification

Michael Collins, Yoram Singer
1999 Conference on Empirical Methods in Natural Language Processing  
We present two algorithms. The first method uses a similar algorithm to that of (Yarowsky 95), with modifications motivated by (Blum and Mitchell 98).  ...  The approach gains leverage from natural redundancy in the data: for many named-entity instances both the spelling of the name and the context inwhich it appears are sufficient to determine its type.  ...  The Algorithm in (Yarowsky 95) We can now compare this algorithm to that of (Yarowsky 95) . The core of Yarowsky's algorithm is as follows: . . .  ... 
dblp:conf/emnlp/CollinsS99 fatcat:5nhyfv42yrg4za6okmoylriwx4

Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts

Taesun Moon, Jason Baldridge
2007 Conference on Empirical Methods in Natural Language Processing  
We explore the use of multiple alignment approaches and a bigram tagger to reduce the noise in the projected tags.  ...  This leads to tagging accuracy in the low 80's on Biblical test material and in the 60's on other Middle English material.  ...  We also seek to reduce the human effort involved in producing part-of-speech tags for historical corpora.  ... 
dblp:conf/emnlp/MoonB07 fatcat:3lubcumbozhajhopcjgsmybudm

A Methodology for Bilingual Lexicon Extraction from Comparable Corpora

Reinhard Rapp
2015 Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)  
However, for many language pairs parallel corpora are a scarce resource which is why in the current work we discuss methods for dictionary extraction from comparable corpora.  ...  Hereby the aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1) Eliminating the need for initial  ...  I would like to thank Silvia Hansen-Schirra for her support of this work and valuable comments.  ... 
doi:10.18653/v1/w15-4108 dblp:conf/hytra/Rapp15 fatcat:ejunro4cmvadhnvr3wzd4ihwbm

Knowledge Sources for Word Sense Disambiguation [chapter]

Eneko Agirre, David Martinez
2001 Lecture Notes in Computer Science  
We also compare the results for a wide range of algorithms that have been evaluated on a common test setting in our research group.  ...  Two kinds of systems have been defined during the long history of WSD: principled systems that define which knowledge types are useful for WSD, and robust systems that use the information sources at hand  ...  Acknowledgements Some of the algorithms have been jointly developed in cooperation with German Rigau. David Martinez has a scholarship from the Basque Country University.  ... 
doi:10.1007/3-540-44805-5_1 fatcat:dgvl3geyxfbo3npuemcw4vafaq

Identification and Disambiguation of Cognates, False Friends, and Partial Cognates Using Machine Learning Techniques

Oana Frunza, Diana Inkpen
2010 International Journal of Linguistics  
Partial cognates are pairs of words in two languages that have the same meaning in some but not all contexts.  ...  Our approach of identifying cognates and false friends is based on several orthographic similarity measures that we use as features for machine learning classification algorithms.  ...  Trying to develop the tool for other languages is also one of our future aims. In order to do this all we need is to plug in lists of cognates and false friends for the corresponding languages.  ... 
doi:10.5296/ijl.v1i1.309 fatcat:oast4j77qbd5vnorbzrigqbuma

Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining

César de Pablo-Sánchez, Isabel Segura-Bedmar, Paloma Martínez, Ana Iglesias-Maqueda
2012 Knowledge and Information Systems  
When these applications require to work in several languages, NERC becomes a bottleneck because its development requires language-specific tools and resources like lists of names or annotated corpora.  ...  Keywords Named entity recognition and categorization · Information extraction · Multilingual natural language processing · Bootstrapping algorithms Introduction Nowadays there exists an increasing need  ...  The following subsections describe the bootstrapping algorithm in more detail with the help of example figures tracking the process performed in an iteration.  ... 
doi:10.1007/s10115-012-0502-0 fatcat:ukfcua2ru5cfbicckj4hwgv5my

Semi-supervised Text Categorization by Considering Sufficiency and Diversity [chapter]

Shoushan Li, Sophia Yat Mei Lee, Wei Gao, Chu-Ren Huang
2013 Communications in Computer and Information Science  
After carefully considering the diversity preference, we modify the traditional bootstrapping algorithm by training the involved classifiers with random feature subspaces instead of the whole feature space  ...  Experimental evaluation shows the effectiveness of our modified bootstrapping approach in both topic and sentiment-based TC tasks.  ...  Bootstrapping algorithm with random subspace classifiers The size of the feature subset r is an important parameter in this algorithm.  ... 
doi:10.1007/978-3-642-41644-6_11 fatcat:ovcg5rk6inartgessrv6ccon7q

Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

Svitlana Volkova, Theresa Wilson, David Yarowsky
2013 Annual Meeting of the Association for Computational Linguistics  
Starting with a domain-independent, highprecision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the  ...  We study subjective language in social media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams.  ...  Then, for every term not in L B(i−1) that has a frequency ≥ θ f req , the probability of that term being subjective is calculated as shown in Algorithm 1 line 10.  ... 
dblp:conf/acl/VolkovaWY13 fatcat:lcszi45i6fa2zdfv6nthe6pode
« Previous Showing results 1 — 15 out of 240 results