Close Integration of ML and NLP Tools in BioAlvis for Semantic Search in Bacteriology

Robert Bossy, Alain Kotoujansky, Sophie Aubin, Claire Nedellec
2008 Workshop on Semantic Web Applications and Tools for Life Sciences  
This paper focuses on the use of corpus-based machine learning (ML) methods for fine-grained semantic annotation of text. The state of the art in semantic annotation in Life Science as in other technical and scientific domains, takes advantage of recent breakthroughs in the development of natural language processing (NLP) platforms. The resources required to run such platforms include named entity dictionaries, terminologies, grammars and ontologies. The demand for domain-specific,
more » ... and low cost resources led to the intensive use of ML methods. The precise specification of the ML task goal and target knowledge, and the adequate normalization of the training corpus representation can notably increase the quality of the acquired knowledge. We argue in this paper that integrated ML-NLP architectures facilitate such specifications. We illustrate our demonstration with four representative NLP tasks that are part of the BioAlvis semantic annotation platform. Their impact on the quality of the semantic annotation is qualified through the evaluation of an IR application in Bacteriology.
dblp:conf/swat4ls/BossyKAN08 fatcat:qy2jev3y45bhdjz33twz3q73pm