Boosting precision and recall of dictionary-based protein name recognition

Yoshimasa Tsuruoka, Jun'ichi Tsujii
<span title="">2003</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="" style="color: black;">Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine -</a> </i> &nbsp;
Dictionary-based protein name recognition is the first step for practical information extraction from biomedical documents because it provides ID information of recognized terms unlike machine learning based approaches. However, dictionary based approaches have two serious problems: (1) a large number of false recognitions mainly caused by short names. (2) low recall due to spelling variation. In this paper, we tackle the former problem by using a machine learning method to filter out false
more &raquo; ... tives. We also present an approximate string searching method to alleviate the latter problem. Experimental results using the GE-NIA corpus show that the filtering using a naive Bayes classifier greatly improves precision with slight loss of recall, resulting in a much better F-score.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.3115/1118958.1118964</a> <a target="_blank" rel="external noopener" href="">dblp:conf/bionlp/TsuruokaT03</a> <a target="_blank" rel="external noopener" href="">fatcat:pvfq2b7f5jckvd6xc3b3th4m4a</a> </span>
