Extending Knowledge and Deepening Linguistic Processing for the Question Answering System InSicht [chapter]

Sven Hartrumpf
2006 Lecture Notes in Computer Science  
The German question answering (QA) system InSicht participated in QA@CLEF for the second time. It relies on complete sentence parsing, inferences, and semantic representation matching. This year, the system was improved in two main directions. First, the background knowledge was extended by large semantic networks and large rule sets. InSicht's query expansion step can produce more alternatives using these resources. A second direction for improvement was to deepen linguistic processing by
more » ... ing a phenomenon that appears prominently on the level of text semantics: coreference resolution. A new source of lexico-semantic relations and equivalence rules has been established based on compound analyses. WOCADI's compound analysis module determined the structure and semantics of compounds when parsing the German QA@CLEF corpus and the German GIRT (German Indexing and Retrieval Test database) corpus. The compound analyses were used in three ways: to project lexico-semantic relations from compound parts to compounds, to establish a subordination hierarchy between compounds, and to derive equivalence rules between nominal compounds and their analytic counterparts, e.g. between Reisimport ('rice import') and Import von Reis ('import of rice'). Another source of new rules were verb glosses from GermaNet, a German WordNet variant. The glosses were parsed and automatically formalized. The lack of coreference resolution in InSicht was one major source of missing answers in QA@CLEF 2004. Therefore the coreference resolution module CORUDIS was integrated into the parsing during document processing. The resulting coreference partition of mentions (or markables) from a document is used to derive additional networks where mentions are replaced by mentions from the corresponding coreference chain in that partition. The central step in the QA system InSicht, matching (one by one) semantic networks derived from the question parse to document sentence networks, was generalized. Now, a question network can be split at certain semantic relations (e.g. relations for local or temporal specifications); the resulting semantic networks are conjunctively connected. To evaluate the different extensions, the QA system was run on all 400 German questions from QA@CLEF 2004 and 2005 with varying setups. Some of these extensions showed positive effects, but currently they are minor and not yet statistically significant. At least three explanations play a role. First, the differences in the semantic representation of questions and document sentences are often minimal and do not require much background knowledge to be related. Second, there are some questions that need a lot of inferential steps. For many such inference chains, formalized inferential knowledge like axioms and meaning postulates for concepts are missing. Third, the low recall values of some natural language processing modules, e.g. the parser and the coreference resolution module, can cause a missing inferential link and thereby a wrong empty answer. Work on the robustness of these modules will help to answer more questions correctly.
doi:10.1007/11878773_41 fatcat:fmbgkmpwhvar3kzv7ej5lu2qwy