Improving two-stage ad-hoc retrieval for short queries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '98
Short queries in an ad-hoc retrieval environment are difftcult but unavoidable. We present several methods to try to improve our current strategy of 2-stage pseudorelevance feedback retrieval in such a situation. They are: I) avtf query term weighting, 2) variable high frequency Ziptian threshold, 3) collection enrichment, 4) enhancing term variety in raw queries, and 5) using retrieved document local term statistics. Avtf employs collection statistics to weight terms in short queries. Variable
... high frequency threshold defines and ignores statistical stopwords based on query length. Collection enrichment adds other collections to the one under investigation so as to improve the chance of ranking more relevant documents in the top n for the pseudofeedback process. Enhancing term variety to raw queries tries to find highly associated terms in a set of documents that is domain-related to the query. Making the query longer may improve 1st stage retrieval. And retrieved document local statistics re-weight terms in the 2nd stage using the set of domain-related documents rather than the whole collection as used during the initial stage. Experiments were performed using the TREC 5 and 6 environment. It is found that together these methods perform well for the difftcult TRECS topics, and also works for the TREC-6 very short topics.