Filters








9,167 Hits in 11.5 sec

A comparison of methods for the automatic identification of locations in wikipedia

Davide Buscaldi, Paolo Rosso
2007 Proceedings of the 4th ACM workshop on Geographical information retrieval - GIR '07  
In this paper we compare two methods for the automatic identification of geographical articles in encyclopedic resources such as Wikipedia.  ...  The methods are a WordNet-based method that uses a set of keywords related to geographical places, and a multinomial Naïve Bayes classificator, trained over a randomly selected subset of the English Wikipedia  ...  Due to the lack of standardization that can be observed in the pages of Wikipedia, because of the 'open' nature of the Wiki project, the automatic identification of a Wikipedia page as one referring to  ... 
doi:10.1145/1316948.1316971 dblp:conf/gir/BuscaldiR07 fatcat:l4o7zawv5fheljvkffwo264yu4

Simple supervised document geolocation with geodesic grids

Benjamin Wing, Jason Baldridge
2011 Annual Meeting of the Association for Computational Linguistics  
All of our methods predict locations in the context of geodesic grids of varying degrees of resolution. We evaluate the methods on geotagged Wikipedia articles and Twitter feeds.  ...  We investigate automatic geolocation (i.e. identification of the location, expressed as latitude/longitude coordinates) of documents.  ...  Acknowledgments This research was supported by a grant from the Morris Memorial Trust Fund of the New York Community Trust and from the Longhorn Innovation Fund for Technology.  ... 
dblp:conf/acl/WingB11 fatcat:5r43yuvcyfdkbp5jbp3b7mbboe

Identifying Comparable Corpora Using LDA

Judita Preiss
2012 North American Chapter of the Association for Computational Linguistics  
We evaluate the system's performance firstly on data from the online newspaper domain, and secondly on Wikipedia cross-language links.  ...  algorithm to generate pairs of comparable texts without requiring a parallel corpus training phase.  ...  My thanks also go to the three reviewers whose comments strengthened the findings of this work.  ... 
dblp:conf/naacl/Preiss12 fatcat:eenshjy6avbbdjam7laiayfsyq

Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes

Muhidin Mohamed, Mourad Oussalah
2014 International Journal of Advanced Computer Science and Applications  
An approach for named entity classification based on Wikipedia article infoboxes is described in this paper.  ...  Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%.  ...  Major discrepancies arise from the peculiarity of each approach in terms of the Wikipedia features (article text, links, categories, infoboxes) used for the entity identification.  ... 
doi:10.14569/ijacsa.2014.050725 fatcat:uqk2tadqcjaslp7qywbzejuafq

ASEE: An Automated Question Answering System for World History Exams

Tao-Hsing Chang, Yu-Sheng Tsai
2016 NTCIR Conference on Evaluation of Information Access Technologies  
Experimental results shows that the system can correctly answer 21 of 36 questions, which originated from World History B of the National Center Test for University Admissions in Japan in 2011.  ...  This study designed a system called ASEE, which can answer the multiple-choice items provided by the QALab-2 task in NTCIR-12 conference.  ...  a set formed by all sentences in the article that was located for option a.  ... 
dblp:conf/ntcir/ChangT16 fatcat:jgkxpe7ywjf35isoqw55aokuue

Understanding user's query intent with wikipedia

Jian Hu, Gang Wang, Fred Lochovsky, Jian-tao Sun, Zheng Chen
2009 Proceedings of the 18th international conference on World wide web - WWW '09  
We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided.  ...  We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results show that our method significantly outperforms other approaches in each intent domain.  ...  According to the definition from Wikipedia, "Travel is the change in location of people on a trip, or the process of time involved in a person or object moving from one location to another" (http://en.wikipedia.org  ... 
doi:10.1145/1526709.1526773 dblp:conf/www/HuWLSC09 fatcat:q3sd2rpdbjaijjrviuqzmnld5a

University of Pittsburgh at GeoCLEF 2008: Towards Effective Geographic Information Retrieval

Qiang Pu, Daqing He, Qi Li
2008 Conference and Labs of the Evaluation Forum  
information in Wikipedia for identifying geo-locations.  ...  based on the geo-locations generated by GCEC is effectiveness in improving Geographic retrievals. 3) Using Wikipedia we can find the coordinates for many geo-locations, but its usage for query expansion  ...  Acknowledgements This work was partially supported by China Scholarship Council and the University of Pittsburgh.  ... 
dblp:conf/clef/PuHL08a fatcat:asnzkdpwufhtrcxxiylmiuuiyi

Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

Peng Li, Jing Jiang, Yinglin Wang
2010 Annual Meeting of the Association for Computational Linguistics  
Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in.  ...  In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles.  ...  We thank the anonymous reviewers for their helpful comments.  ... 
dblp:conf/acl/LiJW10 fatcat:r2ywetw5d5akxcenindo7o67ka

ABRIR at NTCIR-9 GeoTime Task Usage of Wikipedia and GeoNames for Handling Named Entity Information

Masaharu Yoshioka
2011 NTCIR Conference on Evaluation of Information Access Technologies  
However, failure analysis showed that the identification of named entities and relationships between these entities and the query is important in improving the quality of the system.  ...  In the previous NTCIR8-GeoTime task, ABRIR (Appropriate Boolean query Reformulation for Information Retrieval) proved to be one of the most effective systems for retrieving documents with Geographic and  ...  Acknowledgement This research was partially supported by a Grant-in-Aid for Scientific Research (B) 21300029, from the Japan Society for the Promotion of Science.  ... 
dblp:conf/ntcir/Yoshioka11 fatcat:svbh2lflcvan5bbxkhd6jeledi

Information Extraction from Wikipedia Using Pattern Learning

Márton Miháltz
2010 Acta Cybernetica  
In this paper we present solutions for the crucial task of extracting structured information from massive free-text resources, such as Wikipedia, for the sake of semantic databases serving upcoming Semantic  ...  We also propose a method for learning verb frame-based extraction patterns automatically from labeled data.  ...  This system also served as a baseline for comparison to further research, in which we investigated a method to automatically learn extraction patterns.  ... 
dblp:journals/actaC/Mihaltz10 fatcat:luikis7nqzbereeg44w7weveue

WikiSense: Supersense Tagging of Wikipedia Named Entities Based WordNet

Joseph Z. Chang, Richard Tzong-Han Tsai, Jason S. Chang
2009 Pacific Asia Conference on Language, Information and Computation  
We present WikiSense, an implementation of the proposed method for extending the named entity coverage of WordNet by sense tagging Wikipedia titles.  ...  The proposed method involves automatically recognizing whether a title is a named entity, automatically generating two sets of training data, and automatically building a classification model for training  ...  In an implementation of the proposed method and for a recent Wikipedia dump, we retrieved 1,736,645 articles (out of 2,307,815) with an NE title.  ... 
dblp:conf/paclic/ChangTC09 fatcat:z3mabv6s3rcuxbaggwymfie2ua

Mining Wiki Resources for Multilingual Named Entity Recognition

Alexander E. Richman, Patrick Schone
2008 Annual Meeting of the Association for Computational Linguistics  
In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) tags requiring minimal  ...  We demonstrate the system by using the generated corpus as training sets for a variant of BBN's Identifinder in French, Ukrainian, Spanish, Polish, Russian, and Portuguese, achieving overall F-scores as  ...  During preprocessing, we typically collected a list of people names automatically, using the entity identification methods appropriate to titles of Wikipedia articles.  ... 
dblp:conf/acl/RichmanS08 fatcat:kabrfxjptfg4helyvjyuzo7ftu

Toward the automatic extraction of knowledge of usable goods

Mei Uemura, Naho Orita, Naoaki Okazaki, Kentaro Inui
2016 Pacific Asia Conference on Language, Information and Computation  
These results together suggest future directions to build a large-scale corpus and improve the automatic identification of knowledge of usable goods.  ...  Our first attempt toward the automatic identification of such knowledge shows that a model using conditional random fields approaches the human annotation (F score 73.2%).  ...  On the other hand, there are a few number of instances for Certainty of Effect, Degree of Effect, Null Effect, Part of, Location, Time, and User. This may due to the content of the Wikipedia leads.  ... 
dblp:conf/paclic/UemuraOOI16 fatcat:jjghforphfh2pl5fdurd5bsw3y

A Simple Yet Robust Algorithm for Automatic Extraction of Parallel Sentences: A Case Study on Arabic-English Wikipedia Articles

Maha Jarallah Althobaiti
2021 IEEE Access  
In this paper, we present a novel method to automatically create parallel sentences from comparable corpora.  ...  We use Arabic and English Wikipedia as a comparable corpus to apply our proposed method and construct a parallel corpus between Arabic and English.  ...  Wikipedia, a comparable corpus utilised in our study, is described in Section 3. A summary of our proposed method is explained in Section 4.  ... 
doi:10.1109/access.2021.3137830 fatcat:m2r345y5xnbujof2jegybkiz4e

Boot-Strapping Language Identifiers for Short Colloquial Postings [chapter]

Moises Goldszmidt, Marc Najork, Stelios Paparizos
2013 Lecture Notes in Computer Science  
This method provides massive amount of automatically labeled data that act as a bootstrapping mechanism which we empirically show boosts the accuracy of the models.  ...  To this end we thoroughly evaluate the use of Wikipedia to build language identifiers for a large number of languages (52) and a large corpus and conduct a large scale study of the best-known algorithms  ...  This is truly an indication of the value of our method for generating boot-strapped labelsthe large amount of automatically training data generated by our method boosts the accuracy of our relatively simple  ... 
doi:10.1007/978-3-642-40991-2_7 fatcat:sfjx3eb4tnckldvnooamjql4mi
« Previous Showing results 1 — 15 out of 9,167 results