Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature

Gurusamy Murugesan, Sabenabanu Abdulkadhar, Jeyakumar Natarajan, Bin Liu
2017 PLoS ONE  
Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI
more » ... traction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems.
doi:10.1371/journal.pone.0187379 pmid:29099838 pmcid:PMC5669485 fatcat:z3i4byob55hwlenus43pmdr6qu