Effective Crowd-Annotation of Participants, Interventions, and Outcomes in the Text of Clinical Trial Reports

Markus Zlabinger, Marta Sabou, Sebastian Hofstätter, Allan Hanbury
2020 Findings of the Association for Computational Linguistics: EMNLP 2020   unpublished
The search for Participants, Interventions, and Outcomes (PIO) in clinical trial reports is a critical task in Evidence Based Medicine. For an automatic PIO extraction, high-quality corpora are needed. Obtaining such a corpus from crowdworkers, however, has been shown to be ineffective since (i) workers usually lack domain-specific expertise to conduct the task with sufficient quality, and (ii) the standard approach of annotating entire abstracts of trial reports as one task-instance (i.e. HIT)
more » ... leads to an uneven distribution in task effort. In this paper, we switch from entire abstract to sentence annotation, referred to as the SEN-BASE approach. We build upon SENBASE in SENSUPPORT, where we compensate the lack of domain-specific expertise of crowdworkers by showing for each task-instance similar sentences that are already annotated by experts. Such tailored task-instance examples are retrieved via unsupervised semantic shorttext similarity (SSTS) method -and we evaluate nine methods to find an effective solution for SENSUPPORT. We compute the Cohen's Kappa agreement between crowd-annotations and gold standard annotations and show that (i) both sentence-based approaches outperform a BASELINE approach where entire abstracts are annotated; (ii) supporting annotators with tailored task-instance examples is the best performing approach with Kappa agreements of 0.78/0.75/0.69 for P, I, and O respectively. 1 The I and C were unified as Intervention 2 Referred to as HIT on the Mechanical Turk platform
doi:10.18653/v1/2020.findings-emnlp.274 fatcat:3dm2rirucfazhdrldvkq3jvz3q