Interpretation of coordinations, compound generation, and result fusion for query variants
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13
We investigate interpreting coordinations (e.g. word sequences connected with coordinating conjunctions such as "and" and "or") as logical disjunctions of terms to generate a set of disjunctionfree query variants for information retrieval (IR) queries. In addition, so-called hyphen coordinations are resolved by generating full compound forms and rephrasing the original query, e.g. "rice imand export" is transformed into "rice import and export". Query variants are then processed separately and
... sed separately and retrieval results are merged using a standard data fusion technique. We evaluate the approach on German standard IR benchmarking data. The results show that: i) Our proposed approach to generate compounds from hyphen coordinations produces the correct results for all test topics. ii) Our proposed heuristics to identify coordinations and generate query variants based on shallow natural language processing (NLP) techniques is highly accurate on the topics and does not rely on parsing or part-of-speech tagging. iii) Using query variants to produce multiple retrieval results and merging the results decreases precision at top ranks. However, in combination with blind relevance feedback (BRF), this approach can show significant improvement over the standard BRF baseline using the original queries.