Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ

Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasović, Zhen Nie
<span title="">2020</span> <i title="Association for Computational Linguistics"> Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations </i> &nbsp; <span class="release-stage">unpublished</span>
High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce CROWDAQ, 1 an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and
pipelines in a re-usable format. We show that CROWDAQ simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/2020.emnlp-demos.17">doi:10.18653/v1/2020.emnlp-demos.17</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hvibmuxfe5d7nkv22bjcebttni">fatcat:hvibmuxfe5d7nkv22bjcebttni</a> </span>
