PubTermVariants: biomedical term variants and their use for PubMed search

Lana Yeganova, Won Kim, Sun Kim, Rezarta Islamaj Doğan, Wanli Liu, Donald C Comeau, Zhiyong Lu, W John Wilbur
2016 Proceedings of the 15th Workshop on Biomedical Natural Language Processing  
Term normalization is frequently used in information retrieval task to reduce variant word forms to a common form. The most general term normalization technique used in practice is stemming, however it has been found to not be completely reliable. Here we present PubTermVariants, a high-quality data-driven resource of term variant pairs that can improve search results in PubMed. For a given pair, we consider two terms to be variants if they stem to the same form, pass the hypergeometric test,
more » ... d pass the morpho-semantic test. We perform manual evaluation of a subset of PubTermVariants that confirms the high quality of the candidate pairs. We further present experiments that demonstrate their usefulness for Pub-Med search.
doi:10.18653/v1/w16-2919 dblp:conf/bionlp/YeganovaKKDLCLW16 fatcat:3c3zh3uomjhwxpcpobhadzxcxi