Thesaurus Extension Using Web Search Engines [chapter]

Robert Meusel, Mathias Niepert, Kai Eckert, Heiner Stuckenschmidt
2010 Lecture Notes in Computer Science  
Maintaining and extending large thesauri is an important challenge facing digital libraries and IT businesses alike. In this paper we describe a method building on and extending existing methods from the areas of thesaurus maintenance, natural language processing, and machine learning to (a) extract a set of novel candidate concepts from text corpora and (b) to generate a small ranked list of suggestions for the position of these concept in an existing thesaurus. Based on a modification of the
more » ... tandard tf-idf term weighting we extract relevant concept candidates from a document corpus. We then apply a pattern-based machine learning approach on content extracted from web search engine snippets to determine the type of relation between the candidate terms and existing thesaurus concepts. The approach is evaluated with a largescale experiment using the MeSH and WordNet thesauri as testbed.
doi:10.1007/978-3-642-13654-2_24 fatcat:gc5evxgvejgyrcg6nkqwo6m4ri