A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Evaluating Models' Local Decision Boundaries via Contrast Sets
2020
Findings of the Association for Computational Linguistics: EMNLP 2020
unpublished
Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture the abilities a dataset is intended to test. We propose a more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data. In particular, after a dataset is constructed, we
doi:10.18653/v1/2020.findings-emnlp.117
fatcat:lnvj4ujjozh5pocryw7b233sne