GAPSCORE: finding gene and protein names one word at a time

J. T. Chang, H. Schutze, R. B. Altman
2004 Bioinformatics  
Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that quantifies their appearance, morphology and context. Results: We evaluated GAPSCORE against the Yapex data set and achieved an F -score of
more » ... ed an F -score of 82.5% (83.3% recall, 81.5% precision) for partial matches and 57.6% (58.5% recall, 56.7% precision) for exact matches. Since the method is statistical, users can choose score cutoffs that adjust the performance according to their needs.
doi:10.1093/bioinformatics/btg393 pmid:14734313 fatcat:gxuf6rwrzvckdmbte2itpybr6a