Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature

Kyubum Lee, Chih-Hsuan Wei, Zhiyong Lu
2020 Briefings in Bioinformatics  
Motivation To obtain key information for personalized medicine and cancer research, clinicians and researchers in the biomedical field are in great need of searching genomic variant information from the biomedical literature now than ever before. Due to the various written forms of genomic variants, however, it is difficult to locate the right information from the literature when using a general literature search system. To address the difficulty of locating genomic variant information from the
more » ... literature, researchers have suggested various solutions based on automated literature-mining techniques. There is, however, no study for summarizing and comparing existing tools for genomic variant literature mining in terms of how to search easily for information in the literature on genomic variants. Results In this article, we systematically compared currently available genomic variant recognition and normalization tools as well as the literature search engines that adopted these literature-mining techniques. First, we explain the problems that are caused by the use of non-standard formats of genomic variants in the PubMed literature by considering examples from the literature and show the prevalence of the problem. Second, we review literature-mining tools that address the problem by recognizing and normalizing the various forms of genomic variants in the literature and systematically compare them. Third, we present and compare existing literature search engines that are designed for a genomic variant search by using the literature-mining techniques. We expect this work to be helpful for researchers who seek information about genomic variants from the literature, developers who integrate genomic variant information from the literature and beyond.
doi:10.1093/bib/bbaa142 pmid:32770181 pmcid:PMC8138883 fatcat:yhvvphk3obdjljwt2xj3dw2nmu