Effective Structured Query Formulation for Session Search

Dongyi Guan, Hui Yang, Nazli Goharian
2012 Text Retrieval Conference  
In this work, we emphasize on formulating effective structured queries for session search. For a given query, phrase-like text nuggets are identified and formulated into Lemur queries to feed into the Lemur search engine. Nuggets are substrings in q n , similar to phrases but not necessarily as semantically coherent as phrases. We assume that a valid nugget appears frequently in top returned snippets for q n . In this work, the longest sequences of words consisting of frequent bigrams within
more » ... top returned snippets are identified as nuggets and are used to formulate a new query. By formulating structured query using the nuggets, we greatly boost the search accuracy than just using q n . We experiment both strict and relaxed forms of structured query formulation. The strict form of query formulation achieves an improvement of 13.5% and the relaxed form achieves an improvement of 17.8% on nDCG@10 on TREC 2011 query sets. We further combine the nuggets generated from all queries q 1 , ... , q n-1 , q n , to formulate one structured session query for the entire session. Nuggets from each query are weighed by various weighting schemes to indicate their relations to the current query and their potential contributions to the retrieval performance. We experiment three weighting schemes, uniform (all queries share the same weight), previous vs. current (previous queries q 1 , ... , q n-1 share the same weight while q n uses a different and higher weight), and distance-based (the weights are distributed based on how far a query's position in the session is from the current query). We find that previous vs. current achieves the best search accuracy. For retrieval, we first retrieve a large pool of documents for q n . We then employ a re-ranking model that considers document similarity between clicked documents and documents in the pool as well as dwell time.
dblp:conf/trec/GuanYG12 fatcat:7qaufwczynczhgaqht73q4orwi