Discovery of Novel Biomarkers by Text Mining: A New Avenue for Drug Research?

Carlo A Trugenberger David Peregrim
2013 Journal of Molecular Biomarkers & Diagnosis  
Data are paramount to modern targeted drug design. Precious revelations obtained by applying data mining and computational chemistry on large molecular databases, innovative at one time, are now everyday procedures for therapy identification. However, there is an even larger source of valuable information available that can potentially be tapped for discoveries: repositories constituted by research documents. While numerical methods for the analysis of structured data like those in genomics and
more » ... proteomics databases are well developed and standard toolboxes are easily available, knowledge discovery from unstructured data in text documents is still considered the "Holy Grail" of text mining and no stable methodology has yet emerged from the scant few known attempts. Here we review a recent pilot experiment to discover novel biomarkers and phenotypes for diabetes and obesity by self-organized text mining of about 120,000 PubMed abstracts, public clinical trial summaries, and internal Merck research documents by the InfoCodex semantic engine. Retrieval of known entities missed by other traditional approaches could be demonstrated and the InfoCodex semantic engine was shown to discover new diabetes and obesity biomarkers and phenotypes, although noticeable noise (uninteresting or obvious terms) was generated. The reported text mining approach to biomarker discovery shows much promise and has the potential to be developed into a new avenue for pharmaceutical research, especially to shorten time-to-market of novel drugs, or speed up early recognition of dead ends and adverse reactions. Citation: Trugenberger CA, Peregrim D (2013) Discovery of Novel Biomarkers by Text Mining: A New Avenue for Drug Research? J Mol Biomark Diagn S3: 004.
doi:10.4172/2155-9929.s3-004 fatcat:2nllv7vyvvb3xebqpewfokv3ne