A Full-Text Learning to Rank Dataset for Medical Information Retrieval [chapter]

Vera Boteva, Demian Gholipour, Artem Sokolov, Stefan Riezler
2016 Lecture Notes in Computer Science  
We present a dataset for learning to rank in the medical domain, consisting of thousands of full-text queries that are linked to thousands of research articles. The queries are taken from health topics described in layman's English on the non-commercial NutritionFacts.org website; relevance links are extracted at 3 levels from direct and indirect links of queries to research articles on PubMed. We demonstrate that ranking models trained on this dataset by far outperform standard bag-of-words
more » ... rieval models. The dataset can be downloaded from: www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/.
doi:10.1007/978-3-319-30671-1_58 fatcat:oftuozvprbdxzch4fvgvrf3oja