Essie: A Concept-based Search Engine for Structured Biomedical Text

N. C. Ide, R. F. Loane, D. Demner-Fushman
2007 JAMIA Journal of the American Medical Informatics Association  
A b s t r a c t This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie's design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie's performance was evaluated using data
more » ... nd standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain. A rapidly increasing amount of biomedical information in electronic form is readily available to researchers, health care providers, and consumers. However, readily available does not mean conveniently accessible. The large volume of literature makes finding specific information ever more difficult. Development of effective search strategies is time consuming, 1 requires experienced and educated searchers, 2 well versed in biomedical terminology, 3 and is beyond the capability of most consumers. 4 Essie, a search engine developed and used at the National Library of Medicine, incorporates a number of strategies aimed at alleviating the need for sophisticated user queries. These strategies include a fine-grained tokenization algorithm that preserves punctuation, concept searching utilizing synonymy, and phrase searching based on the user's query. This article describes related background work, the Essie search system, and the evaluation of that system. The Essie search system is described in detail, including its indexing strategy, query interpretation and expansion, and ranking of search results. One of the first retrieval systems that implemented automatic concept-based indexing and extraction of the UMLS concepts from users' requests was SAPHIRE. 8 SAPHIRE utilized the UMLS Metathesaurus by breaking free text into words and mapping them into UMLS terms and concepts. Documents were indexed with concepts, queries were mapped to concepts, and standard term frequency and inverse document frequency weighting was applied. When measured with combined recall and precision,
doi:10.1197/jamia.m2233 pmid:17329729 pmcid:PMC2244877 fatcat:l3dg4ofxtnakbdhijktzci4p5i