Structuring and extracting knowledge for the support of hypothesis generation in molecular biology

Marco Roos, M Scott Marshall, Andrew P Gibson, Martijn Schuemie, Edgar Meij, Sophia Katrenko, Willem van Hage, Konstantinos Krommydas, Pieter W Adriaans
2009 BMC Bioinformatics  
Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from
more » ... erature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. Results: We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence.
doi:10.1186/1471-2105-10-s10-s9 pmid:19796406 pmcid:PMC2755830 fatcat:hf7i72w2uzbm5mnzlqw774x4je