Bootstrapped extraction of class attributes

Joseph Reisinger, Marius Pasca
2009 Proceedings of the 18th international conference on World wide web - WWW '09  
As an alternative to previous studies on extracting class attributes from unstructured text, which consider either Web documents or query logs as the source of textual data, A bootstrapped method extracts class attributes simultaneously from both sources, using a small set of seed attributes. The method improves extraction precision and also improves attribute relevance across 40 test classes. EXTRACTION OF ATTRIBUTES Motivation: Class attributes capture quantifiable properties (e.g., hiking
more » ... es (e.g., hiking trails, entrance fee and elevation), of given classes of instances (e.g., NationalPark), and thus potentially serve as a skeleton towards constructing large-scale knowledge bases automatically. Previous work on extracting class attributes from unstructured text consider either Web documents [5] or query logs [2] as the extraction source. In this poster, we develop Bootstrapped Web Search (BWS), a method for combining Web documents and query logs as textual data sources that may contain class attributes. Web documents have textual content of higher semantic quality, convey information directly in natural language rather than through sets of keywords, and contain more raw textual data. In contrast, search queries are usually ambiguous, short, keyword-based approximations of often-underspecified user information needs. Previous work has shown, however, that extraction from query logs yields significantly higher precision than extraction from Web documents [2] . BWS is a generic method for multiple-source class attribute extraction that allows for corpora with varying levels of extraction precision to be combined favorably. It requires no supervision other than a small set of seed attributes for each semantic class. We test this method by combining query logs and Web documents, leveraging their strengths in order to improve coverage and precision. Combining Multiple Data Sources: Significant previous work has been done on attribute extraction across a wide variety of data sources, e.g. news reports, query logs and Web documents. If extraction from such domains yields high precision results, intuitively it should be possible to obtain even more accurate attributes while lowering bias by using a combination of data sources.
doi:10.1145/1526709.1526945 dblp:conf/www/ReisingerP09 fatcat:ydqvjwa5srbahcpfkmuex7gyji