Joint question clustering and relevance prediction for open domain non-factoid question answering

Snigdha Chaturvedi, Vittorio Castelli, Radu Florian, Ramesh M. Nallapati, Hema Raghavan
2014 Proceedings of the 23rd international conference on World wide web - WWW '14  
Web searches are increasingly formulated as natural language questions, rather than keyword queries. Retrieving answers to such questions requires a degree of understanding of user expectations. An important step in this direction is to automatically infer the type of answer implied by the question, e.g., factoids, statements on a topic, instructions, reviews, etc. Answer Type taxonomies currently exist for factoid-style questions, but not for open-domain questions. Building taxonomies for
more » ... actoid questions is a harder problem since these questions can come from a very broad semantic space. A few attempts have been made to develop taxonomies for non-factoid questions, but these tend to be too narrow or domain specific. In this paper, we address this problem by modeling the Answer Type as a latent variable that is learned in a data-driven fashion, allowing the model to be more adaptive to new domains and data sets. We propose approaches that detect the relevance of candidate answers to a user question by jointly 'clustering' questions according to the hidden variable, and modeling relevance conditioned on this hidden variable. In this paper we propose 3 new models: (a) Logistic Regression Mixture (LRM), (b) Glocal Logistic Regression Mixture (G-LRM) and (c) Mixture Glocal Logistic Regression Mixture (MG-LRM) that automatically learn question-clusters and cluster-specific relevance models. All three models perform better than a baseline relevance model that uses explicit Answer Type categories predicted by a supervised Answer-Type classifier, on a newsgroups dataset. Our models also perform better than a baseline relevance model that does not use any answer-type information on a blogs dataset.
doi:10.1145/2566486.2567999 dblp:conf/www/ChaturvediCFNR14 fatcat:ehmzr6g6ujatjplzirf65uumga