Analysis of biomedical and health queries: Lessons learned from TREC and CLEF evaluation benchmarks

Lynda Tamine, Cécile Chouquet, Thomas Palmer
2015 Journal of the Association for Information Science and Technology  
A large body of research work examined, from both the query side and the user behaviour side, the characteristics of medical and health-related searches. One of the core issues in medical information retrieval is diversity of tasks that lead to diversity of categories of information needs and queries. From the evaluation perspective, another related challenging issue is the limited availability of appropriate test collections allowing the experimental validation of medically task oriented IR
more » ... hniques and systems. Since 2003, medical standardized evaluation benchmarks such as TREC and CLEF provide to information retrieval research community various controlled medical tasks specifications with related document collections, queries, relevance assessment and specific metadata. The literature clearly reports a rapid increase in the use of these evaluation benchmarks. In this paper, we explore the peculiarities of TREC and CLEF medically oriented tasks and queries through the ⋆ Text REtrieval Conference . 2 analysis of the differences and the similarities between queries across tasks, with respect to length, specificity and clarity features and then study their effect on retrieval performance. More specifically, we developed an exploratory data analysis as well as a predictive data analysis using 11 TREC and CLEF medical test collections containing 374 queries and their corresponding document collections and expert relevance assessments, organized according to the involved medical tasks. Based on the outcome of our study, we show that, even for expert oriented queries, language specificity level varies significantly across tasks as well as search difficulty and that the most related predictive factors are linked to query length and query clarity. Additional findings highlight that query clarity factors are task dependent and that query terms specificity based on domain-specific terminology resources is not significantly linked to term rareness in the document collection. The lessons learned from our study could serve as starting points for the design of future task-based medical IR frameworks.
doi:10.1002/asi.23351 fatcat:odvg52zbnrhili7ikkopf3rbaa