Search Query Categorization at Scale

Michal Laclavik, Marek Ciglan, Sam Steingold, Martin Seleng, Alex Dorman, Stefan Dlugolinsky
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion  
State of the art query categorization methods usually exploit web search services to retrieve the best matching web documents and map them to a given taxonomy of categories. This is effective but impractical when one does not own a web corpus and has to use a 3 rd party web search engine API. The problem lies in performance and in financial costs. In this paper, we present a novel, fast and scalable approach to categorization of search queries based on a limited intermediate corpus: we use
more » ... edia as the knowledge base. The presented solution relies on two steps: first a query is mapped to the relevant Wikipedia pages; second, the retrieved documents are categorized into a given taxonomy. We approach the first challenge as an entity search problem and present a new document categorization approach for the second step. On a standard data set, our approach achieves results comparable to the state-of-the-art approaches while maintaining high performance and scalability.
doi:10.1145/2740908.2741995 dblp:conf/www/LaclavikCSSDD15 fatcat:qlqih7dkhbdcbo4c2rg6koy4se