Searching semantically similar questions from a large community-based question archive

Mingrong Liu, Yicen Liu, Qing Yang
2009 2009 International Conference on Natural Language Processing and Knowledge Engineering  
This paper provides a novel and totally statistical method to search similar questions from a large question archive for a given queried question. Firstly, a word relevance model is trained based on the whole question archive which is made up of millions of natural language questions proposed by users on the web. The word relevance model is utilized to find most semantically related words to a specific word. Secondly, in order to find semantically similar questions for a queried question, each
more » ... on-stop word in a question is expanded with the help of word relevance model and represented as a word vector. Elements of the vector include the word itself and some semantically related words to it. Elements of the word vector are weighted by combining both classical IR term weighting method and word transformation probability learned from the relevance model. Then the question is mapped to a question vector as the normalized center of the word vectors representing these words contained in it. The problem of question retrieval can be solved by comparing the similarity between question vectors. The method is actually a simple question expansion based Kernel approach. Experimental results indicate the proposed method outperforms the baseline methods such as Vector Space Model (VSM) and Language Model for Information Retrieval (LMIR).
doi:10.1109/nlpke.2009.5313808 dblp:conf/nlpke/LiuLY09 fatcat:dmnwrcrrorhubmxfprsvpzjpoe