Rutabaga by any other name: extracting biological names

Lynette Hirschman, Alexander A. Morgan, Alexander S. Yeh
<span title="">2002</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/p4kk6lusgrhyxecgig72iasi5q" style="color: black;">Journal of Biomedical Informatics</a> </i> &nbsp;
As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has
more &raquo; ... een an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93-95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75-80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance. (L. Hirschman). 1 See http://www.nih.gov/science/models/activities/index.html for a list of organisms now being sequenced, including include the puffer fish, chicken, sea urchin; the rice genome was recently published, along with mouse; and TIGR has recently announced that it will sequence entire ecosystems. 1532-0464/02/$ -see front matter Ó
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/s1532-0464(03)00014-5">doi:10.1016/s1532-0464(03)00014-5</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/12755519">pmid:12755519</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/4w2eigmhuzecvnpe76dj7uu2u4">fatcat:4w2eigmhuzecvnpe76dj7uu2u4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190304222915/https://core.ac.uk/download/pdf/82623327.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2a/a1/2aa1a36dcfcdfaf8dd6f6db89bde9b1ec8250fc5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/s1532-0464(03)00014-5"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> elsevier.com </button> </a>