Linggle: a Web-scale Linguistic Search Engine for Words in Context

Joanne Boisson, Ting-hui Kao, Jian-Cheng Wu, Tzu-Hsi Yen, Jason S. Chang
2013 Annual Meeting of the Association for Computational Linguistics  
In this paper, we introduce a Web-scale linguistics search engine, Linggle, that retrieves lexical bundles in response to a given query. The query might contain keywords, wildcards, wild parts of speech (PoS), synonyms, and additional regular expression (RE) operators. In our approach, we incorporate inverted file indexing, PoS information from BNC, and semantic indexing based on Latent Dirichlet Allocation with Google Web 1T. The method involves parsing the query to transforming it into
more » ... keyword retrieval commands. Word chunks are retrieved with counts, further filtering the chunks with the query as a RE, and finally displaying the results according to the counts, similarities, and topics. Clusters of synonyms or conceptually related words are also provided. In addition, Linggle provides example sentences from The New York Times on demand. The current implementation of Linggle is the most functionally comprehensive, and is in principle language and dataset independent. We plan to extend Linggle to provide fast and convenient access to a wealth of linguistic information embodied in Web scale datasets including Google Web 1T and Google Books Ngram for many major languages in the world.
dblp:conf/acl/BoissonKWYC13 fatcat:w3vabe3v7rdopia2piccajxbme