Experiments with LSA for Passage Re-Ranking in Question Answering

David Tomás, José Luis Vicedo González, Empar Bisbal, Lidia Moreno
2006 Conference and Labs of the Evaluation Forum  
As in the previous QA@CLEF track, two separate groups at the University of Alicante participated this year using different approaches. This paper describes the work of Alicante 1 group. We have continued with the research line established in the past competition, where the main goal was to obtain a fully data-driven system based on machine learning techniques. Last year an XML framework was established in order to obtain a modular system where each component could be easily replaced or
more » ... In this framework, a question classification system based on Support Vector Machines (SVM) and surface text features was included, achieving remarkable performance in this stage. The main novelties introduced this year are focused on the information retrieval stage. First, we employed Indri as our search engine for passage retrieval. Secondly, we developed a module for passage re-ranking based on Latent Semantic Analysis (LSA). This technique provides a method for determining the similarity of meaning between words by analysis of large text corpora. In our experiments, every question was compared with every passage returned by the search engine by means of LSA in order to re-rank them. Looking at the results, this technique increased the retrieval accuracy for definition questions but it decreased accuracy on factoid ones. To take advantage of the flexibility and adaptability of our machine learning based proposal, this year we extended our participation to monolingual Spanish task and bilingual Spanish-English task. We reach a best overall accuracy of 29.47% in the first task and 20.00% in the second one.
dblp:conf/clef/TomasGBM06 fatcat:upxrceesdvcute6phvmuwu7dam