Kernel approaches for genic interaction extraction

S. Kim, J. Yoon, J. Yang
2007 Bioinformatics  
Motivation: Automatic knowledge discovery and efficient information access such as named entity recognition and relation extraction between entities have recently become critical issues in the biomedical literature. However, the inherent difficulty of the relation extraction task, mainly caused by the diversity of natural language, is further compounded in the biomedical domain because biomedical sentences are commonly long and complex. In addition, relation extraction often involves modeling
more » ... ng range dependencies, discontiguous word patterns and semantic relations for which the pattern-based methodology is not directly applicable. Results: In this article, we shift the focus of biomedical relation extraction from the problem of pattern extraction to the problem of kernel construction. We suggest four kernels: predicate, walk, dependency and hybrid kernels to adequately encapsulate information required for a relation prediction based on the sentential structures involved in two entities. For this purpose, we view the dependency structure of a sentence as a graph, which allows the system to deal with an essential one from the complex syntactic structure by finding the shortest path between entities. The kernels we suggest are augmented gradually from the flat features descriptions to the structural descriptions of the shortest paths. As a result, we obtain a very promising result, a 77.5 F-score with the walk kernel on the Language Learning in Logic (LLL) 05 genic interaction shared task. Availability: The used algorithms are free for use for academic research and are available from our Web site http://mllab.sogang.$shkim/LLL05.tar.gz. Contact:
doi:10.1093/bioinformatics/btm544 pmid:18003645 fatcat:p56wbgkrajbpbnq2fxzhdf7yle