Filters








16 Hits in 1.5 sec

Dynabench: Rethinking Benchmarking in NLP [article]

Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush (+7 others)
2021 arXiv   pre-print
In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge  ...  We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.  ...  ZW has been supported in part by the Canada 150 Research Chair program and the UK-Canada AI Artificial Intelligence Initiative. YN  ... 
arXiv:2104.14337v1 fatcat:wbkzwzx35vezzmjryrtpcdnzj4

Dynabench: Rethinking Benchmarking in NLP

Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush (+7 others)
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies   unpublished
In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge  ...  We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.  ...  ZW has been supported in part by the Canada 150 Research Chair program and the UK-Canada AI Artificial Intelligence Initiative. YN  ... 
doi:10.18653/v1/2021.naacl-main.324 fatcat:7tjlpr2yvfcm3gq653ur44hzyu

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks [article]

Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela
2022 arXiv   pre-print
Dynatask is integrated with Dynabench, a research platform for rethinking benchmarking in AI that facilitates human and model in the loop data collection and evaluation.  ...  We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models  ...  Introduction Data is the backbone of NLP research. One of the most fruitful approaches for making progress on NLP tasks has historically been benchmarking.  ... 
arXiv:2204.01906v1 fatcat:ako727taonhmhcf7mgkccqqxkq

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela
2022 Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations   unpublished
Dynatask is integrated with Dynabench, a research platform for rethinking benchmarking in AI that facilitates human and model in the loop data collection and evaluation.  ...  We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models  ...  Introduction Data is the backbone of NLP research. One of the most fruitful approaches for making progress on NLP tasks has historically been benchmarking.  ... 
doi:10.18653/v1/2022.acl-demo.17 fatcat:fofb2ncnfbed3d5kxb6bnmumbq

Analyzing Dynamic Adversarial Training Data in the Limit [article]

Eric Wallace, Adina Williams, Robin Jia, Douwe Kiela
2021 arXiv   pre-print
Acknowledgments We thank Max Bartolo, Yixin Nie, Tristan Thrush, Pedro Rodriguez, and the other members of the Dynabench team for their valuable feedback on our crowdsourcing platform and paper.  ...  Dynabench: Rethinking benchmarking in NLP.  ...  What will it take to fix benchmarking in natural language un- derstanding?  ... 
arXiv:2110.08514v1 fatcat:rclybepweneyjdbem2cvpmkvxi

Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation [article]

Max Bartolo, Tristan Thrush, Robin Jia, Sebastian Riedel, Pontus Stenetorp, Douwe Kiela
2021 arXiv   pre-print
While dynamic adversarial data collection, in which a human annotator tries to write examples that fool a model-in-the-loop, can improve model robustness, this process is expensive which limits the scale  ...  In this work, we are the first to use synthetic adversarial data generation to make question answering models more robust to human adversaries.  ...  Acknowledgments The authors would like to thank the Dynabench team for their feedback and continuous support.  ... 
arXiv:2104.08678v2 fatcat:jimalduaaff2vgg2doi3mluqs4

How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs? [article]

Indira Sen, Mattia Samory, Fabian Floeck, Claudia Wagner, Isabelle Augenstein
2021 arXiv   pre-print
As NLP models are increasingly deployed in socially situated settings such as online abusive content detection, it is crucial to ensure that these models are robust.  ...  We investigate the benefits of CAD for social NLP models by focusing on three social computing constructs -- sentiment, sexism, and hate speech.  ...  Dynabench: Rethinking benchmarking in NLP.  ... 
arXiv:2109.07022v1 fatcat:5aqcgc2cobethmhmreub2ubmqy

Identifying Adversarial Attacks on Text Classifiers [article]

Zhouhang Xie, Jonathan Brophy, Adam Noack, Wencong You, Kalyani Asthana, Carter Perkins, Sabrina Reis, Sameer Singh, Daniel Lowd
2022 arXiv   pre-print
As our second contribution, we use this dataset to develop and benchmark a number of classifiers for attack identification -- determining if a given text has been adversarially manipulated and by which  ...  In response, there is a growing body of work on robust learning, which reduces vulnerability to these attacks, though sometimes at a high cost in compute time or accuracy.  ...  Dynabench: Rethinking benchmarking in NLP. hit Bansal, Christopher Potts, and Adina Williams.  ... 
arXiv:2201.08555v1 fatcat:bknr7chhaza2bhnrwveufhot2m

Contextualization and Generalization in Entity and Relation Extraction [article]

Bruno Taillé
2022 arXiv   pre-print
More recently, in 2018, the transfer of entire pretrained Language Models and the preservation of their contextualization capacities enabled to reach unprecedented performance on virtually every NLP benchmark  ...  During the past decade, neural networks have become prominent in Natural Language Processing (NLP), notably for their capacity to learn relevant word representations from large unlabeled corpora.  ...  This implies an important limitation in every NLP study that uses word representations obtained by a Language Model trained on a corpus more recent than the NLP benchmark at hand.  ... 
arXiv:2206.07558v1 fatcat:tv6lylh4gjhohdldlgg24zwpvm

Benchmarking: Past, Present and Future

Kenneth Church, Mark Liberman, Valia Kordoni
2021 Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future   unpublished
These days, benchmarks evolve more bottom up (such as papers with code)  ...  I will talk about our work in trying to rethink the way we do benchmarking in AI, specifically in natural language processing, focusing mostly on the Dynabench platform.  ...  Rethinking Benchmarking in AI Douwe Kiela Facebook AI Research https://douwekiela.github.io/ @douwekiela on Twitter The current benchmarking paradigm in AI has many issues: benchmarks saturate quickly  ... 
doi:10.18653/v1/2021.bppf-1.1 fatcat:ipnmbjgvqndjlhawhiximarfvy

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation [article]

Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Srivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein (+113 others)
2021 arXiv   pre-print
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.  ...  In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters  ...  Dynabench: Rethinking benchmarking in Aadesh Gupta, Kaustubh D. Dhole, Rahul Tarway, NLP.  ... 
arXiv:2112.02721v1 fatcat:uqizuxc4wzgxnnfsc6azh6ckpq

HateCheck: Functional Tests for Hate Speech Detection Models [article]

Paul Röttger, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen Margetts, Janet B. Pierrehumbert
2021 arXiv   pre-print
It also risks overestimating generalisable model performance due to increasingly well-evidenced systematic gaps and biases in hate speech datasets.  ...  Dynabench: Rethinking bench- marking in nlp. arXiv preprint arXiv:2104.14337. Jana Kurrek, Haji Mohammad Saleem, and Derek Ruths. 2020.  ...  This has motivated much research in NLP and the social sciences.  ... 
arXiv:2012.15606v2 fatcat:uq4e5gl6djga7iuszekimn5s64

Evaluation Paradigms in Question Answering

Pedro Rodriguez, Jordan Boyd-Graber
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing   unpublished
Dynabench: Rethinking benchmarking in NLP. In Sewon Min, Julian Michael, Hannaneh Hajishirzi, and Conference of the North American Chapter of the As- Luke Zettlemoyer. 2020.  ...  How should that change NLP leaderboards?  ... 
doi:10.18653/v1/2021.emnlp-main.758 fatcat:fhrn6zr7wzcfxafvheghy7fssi

Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus

Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, Matt Gardner
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing   unpublished
copying in generative models. In Proceedings of 2021. Dynabench: Rethinking benchmarking in the 23rd International Conference on Artificial In- NLP.  ...  However, amples from other benchmark NLP datasets.  ... 
doi:10.18653/v1/2021.emnlp-main.98 fatcat:okmbgm5f3nhbrajymb5x6uqn2e

ANLIzing the Adversarial Natural Language Inference Dataset

Adina Williams, Tristan Thrush, Douwe Kiela
2022
Both insights can guide us in training stronger models going forward.  ...  We perform an in-depth error analysis of the Adversarial NLI (ANLI) dataset, a recently introduced large-scale human-and-model-inthe-loop natural language inference dataset collected dynamically over multiple  ...  Dynabench: Rethinking benchmarking in NLP.  ... 
doi:10.7275/gatd-1283 fatcat:orzilag6h5f7rolxhjqtybymzm
« Previous Showing results 1 — 15 out of 16 results