Bootstrapping and evaluating named entity recognition in the biomedical domain

Andreas Vlachos, Caroline Gasperin
2006 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology - LNLBioNLP '06   unpublished
We demonstrate that bootstrapping a gene name recognizer for FlyBase curation from automatically annotated noisy text is more effective than fully supervised training of the recognizer on more general manually annotated biomedical text. We present a new test set for this task based on an annotation scheme which distinguishes gene names from gene mentions, enabling a more consistent annotation. Evaluating our recognizer using this test set indicates that performance on unseen genes is its main
more » ... akness. We evaluate extensions to the technique used to generate training data designed to ameliorate this problem.
doi:10.3115/1654415.1654448 fatcat:vh24l7vujndrfeoqdgtcclaoee