Protein Named Entity Identification Based on Probabilistic Features Derived from GENIA Corpus and Medical Text on the Web

Sagara Sumathipala, Koichi Yamada, Muneyuki Unehara, Izumi Suzuki
2015 International Journal of Fuzzy Logic and Intelligent Systems  
Protein named entity identification is one of the most essential and fundamental predecessor for extracting information about protein-protein interactions from biomedical literature. In this paper, we explore the use of abstracts of biomedical literature in MEDLINE for protein name identification and present the results of the conducted experiments. We present a robust and effective approach to classify biomedical named entities into protein and non-protein classes, based on a rich set of
more » ... es: orthographic, keyword, morphological and newly introduced Protein-Score features. Our procedure shows significant performance in the experiments on GENIA corpus using Random Forest, achieving the highest values of precision 92.7%, recall 91.7%, and F-measure 92.2% for protein identification, while reducing the training and testing time significantly.
doi:10.5391/ijfis.2015.15.2.111 fatcat:bjl3pfodv5aptdbrbmf57lpb3y