MaxMatcher: Biological Concept Extraction Using Approximate Dictionary Lookup [chapter]

Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu
Lecture Notes in Computer Science  
Dictionary-based biological concept extraction is still the state-ofthe-art approach to large-scale biomedical literature annotation and indexing. The exact dictionary lookup is a very simple approach, but always achieves low extraction recall because a biological term often has many variants while a dictionary is impossible to collect all of them. We propose a generic extraction approach, referred to as approximate dictionary lookup, to cope with term variations and implement it as an
more » ... n system called MaxMatcher. The basic idea of this approach is to capture the significant words instead of all words to a particular concept. The new approach dramatically improves the extraction recall while maintaining the precision. In a comparative study on GENIA corpus, the recall of the new approach reaches a 57% recall while the exact dictionary lookup only achieves a 26% recall.
doi:10.1007/11801603_150 dblp:conf/pricai/ZhouZH06 fatcat:fdwayujrangrbeiaxtixsuymge