A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Datasets: A Community Library for Natural Language Processing
2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
unpublished
The scale, variety, and quantity of publiclyavailable NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as for internet-scale corpora. The design of the library incorporates a distributed,
doi:10.18653/v1/2021.emnlp-demo.21
fatcat:xcrrfkmwkrezrmlnndz4hgfumi