Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering

Haotian Zhang, Jinfeng Rao, Jimmy Lin, Mark D. Smucker
2017 Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '17  
We propose a heuristic called "one answer per document" for automatically extracting high-quality negative examples for answer selection in question answering. Starting with a collection of question-answer pairs from the popular TrecQA dataset, we identify the original documents from which the answers were drawn. Sentences from these source documents that contain query terms (aside from the answers) are selected as negative examples. Training on the original data plus these negative examples
more » ... lds improvements in e ectiveness by a margin that is comparable to successive recent publications on this dataset. Our technique is completely unsupervised, which means that the gains come essentially for free. We con rm that the improvements can be directly a ributed to our heuristic, as other approaches to extracting comparable amounts of training data are not e ective. Beyond the empirical validation of this heuristic, we also share our improved TrecQA dataset with the community to support further work in answer selection.
doi:10.1145/3077136.3080645 dblp:conf/sigir/ZhangRLS17 fatcat:z5r5kajjabggxdfo7camii7pvy