A General Approach to Extracting Full Names and Abbreviations for Chinese Entities from the Web [chapter]

Guang Jiang, Cao Cungen, Sui Yuefei, Han Lu, Shi Wang
2010 IFIP Advances in Information and Communication Technology  
Identifying Full names/abbreviations for entities is a challenging problem in many applications, e.g. question answering and information retrieval. In this paper, we propose a general extraction method of extracting full names/abbreviations from Chinese Web corpora. For a given entity, we construct forward and backward query items and commit them to a search engine (e.g. Google), and utilize search results to extract full names and abbreviations for the entity. To verify the results, filtering
more » ... nd marking methods are used to sort all the results. Experiments show that our method achieves precision of 84.7% for abbreviations, and 77.0% for full names.
doi:10.1007/978-3-642-16327-2_33 fatcat:dxvz37cp25dk7jmsc3vozws5gi