Forum Duplicate Question Detection by Domain Adaptive Semantic Matching

Zhuojia Xu, Hua Yuan
2020 IEEE Access  
Community Question Answering (CQA) forums, such as Stack Overflow, Stack Exchange and Massive Open Online Course (MOOC) forums, spend a lot of manpower and time to manage duplicate questions on the forum. Mismatch of duplicate questions makes users keep asking "new" questions, and the continuous accumulation of duplicate questions may interfere with their information searching again, affecting user satisfaction. Neural Networks (NN) models for parsing semantics provide the possibility of
more » ... ssibility of end-to-end duplicate question detection. Whereas, due to lack of domain data and expertise, NN models for semantic parsing are rarely directly applied to CQA duplicate question detection. This paper proposes a Semantic Matching Model (SMM) integrated with the multi-task transfer learning framework for multi-domain forum duplicate question detection. By designing the word-to-sentence interaction mechanism based on the word-to-word interaction, SMM can automatically choose to ignore or pay attention to potential similar words according to the semantics at the sentence level. The experiments on the benchmark data set and MOOC forum data set state that SMM outperforms baselines, its interaction mechanism is effective and it has an advantage in cross-domain duplicate question detection. INDEX TERMS Community question answering, duplicate question detection, semantic matching, transfer learning. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
doi:10.1109/access.2020.2982268 fatcat:kwulsirggncepb7s2jamslqoxi