TREC 2016 Total Recall Track Overview

Maura R. Grossman, Gordon V. Cormack, Adam Roegiest
2016 Text Retrieval Conference  
The primary purpose of the Total Recall Track is to evaluate, through controlled simulation, methods designed to achieve very high recall -as close as practicable to 100% -with a human assessor in the loop. Motivating applications include, among others, electronic discovery in legal proceedings [3] , systematic review in evidencebased medicine [6] , and the creation of fully labeled test collections for information retrieval ("IR") evaluation [5] . A secondary, but no less important, purpose is
more » ... to develop a sandboxed virtual test environment within which IR systems may be tested, while preventing the disclosure of sensitive test data to participants. At the same time, the test environment also operates as a "black box," affording participants confidence that their proprietary systems cannot easily be reverse engineered. The task to be solved in the Total Recall Track is the following: Given a simple topic description -like those typically used for ad-hoc and Web search -identify the documents in a corpus, one at a time, such that, as nearly as possible, all relevant documents are identified before all non-relevant documents. Immediately after each document is identified, its groundtruth relevance or non-relevance is disclosed. Datasets, topics, and automated relevance assessments were all provided by a Web server supplied by the Track. Participants were required to implement either a fully automated ("automatic") or semi-automated ("manual") process to download the datasets and topics, and to submit documents for assessment to the Web server, which rendered a relevance assessment for each submitted document in real time. Thus, participants were tasked with identifying documents for review, while the Web server simulated the role of a human-in-the-loop assessor operating in real time. Rank-based and set-based evaluation measures were calculated based on the order in which documents were presented to the Web server for assessment, as well as the set of documents that were presented to the Web server at the time a participant "called their shot," or declared that a "reasonable" result had been achieved. Particular emphasis was placed on achieving high recall while reviewing the minimum possible number of documents. The Total Recall Track debuted at TREC 2015 [7] . The TREC 2016 track was operationally identical to the TREC 2015 Track, differing only in the following respects: • This year, participants were required to "call their shot" to indicate when they believed that as many of the relevant documents as reasonably possible had been identified with proportionate effort;
dblp:conf/trec/GrossmanCR16 fatcat:ml2k3vhlpvcvhlbkjvdxi3vsy4