DrugQuest - a text mining workflow for drug association discovery

Nikolas Papanikolaou, Georgios A. Pavlopoulos, Theodosios Theodosiou, Ioannis S. Vizirianakis, Ioannis Iliopoulos
2016 BMC Bioinformatics  
Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Results: Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the
more » ... ank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. Conclusions: DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at
doi:10.1186/s12859-016-1041-6 pmid:27295093 pmcid:PMC4905607 fatcat:ss7fan4ucrgplgdudidmq7adwm