Leveraging both Structured and Unstructured Data for Precision Information Retrieval
Text Retrieval Conference
This paper describes the participation of the Mayo Clinic NLP team in the Text REtreival Conference (TREC) 2017 Precision Medicine track. The novelty of our systems is four-fold. First, compared to our submissions in the previous year, our systems utilized an enhanced named entity recognition (NER) method to extract genes, variants, proteins, and diseases from PubMed articles. This NER method combined several state-of-the-art NER tools including TaggerOne, be-CAS, Reach and tmVAR. The extracted
... entities were indexed in different fields and treated as structured data for retrieval. Second, we used multi-field querying in a Pseudo Relevance Feedback (PRF) model. We first query the unstructured fields (i.e., the fields of title and abstract) and utilize information in structured fields from top-ranked documents as feedback for query expansion. Third, we explored the use of MeSH on Demand, a web service identifying MeSH terms in free-text and recommending similar PubMed articles which are relevant to the text, to boost the performance of our retrieval systems. The reason we used MeSH on Demand is due to its effectiveness for recommending relevant PubMed articles based on our manual judgments. Fourth, we utilized the demographic information (i.e., age and sex) as structured data to filter out the clinical trials that did not meet the criteria in each topic.