Filters








4 Hits in 6.4 sec

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models [article]

Vered Shwartz, Rachel Rudinger, Oyvind Tafjord
2020 arXiv   pre-print
Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models.  ...  As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias.  ...  In this work we focus on the representations of given names in pre-trained LMs (Table 1) .  ... 
arXiv:2004.03012v2 fatcat:fw4gsqbffvbbxenarrdy6xtkyy

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models

Vered Shwartz, Rachel Rudinger, Oyvind Tafjord
2020 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)   unpublished
Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models.  ...  As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias.  ...  In this work we focus on the representations of given names in pre-trained LMs (Table 1) .  ... 
doi:10.18653/v1/2020.emnlp-main.556 fatcat:sxzrrxpsjnax5frgrfisoeytrm

What do Bias Measures Measure? [article]

Sunipa Dev, Emily Sheng, Jieyu Zhao, Jiao Sun, Yu Hou, Mattie Sanseverino, Jiin Kim, Nanyun Peng, Kai-Wei Chang
2021 arXiv   pre-print
Natural Language Processing (NLP) models propagate social biases about protected attributes such as gender, race, and nationality.  ...  To address this gap, this work presents a comprehensive survey of existing bias measures in NLP as a function of the associated NLP tasks, metrics, datasets, and social biases and corresponding harms.  ...  "you are grounded!": Latent name artifacts in pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6850-6861.  ... 
arXiv:2108.03362v1 fatcat:migucaqmgne3hiyirl755o4dpu

Exposing Shallow Heuristics of Relation Extraction Models with Challenge Data

Shachar Rosenman, Alon Jacovi, Yoav Goldberg
2020 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)   unpublished
We identify failure modes of SOTA relation extraction (RE) models trained on TACRED, which we attribute to limitations in the data annotation process.  ...  The process of collecting and annotating training data may introduce distribution artifacts which may limit the ability of models to learn correct generalization behavior.  ...  Vered Shwartz, Rachel Rudinger, and Oyvind Tafjord. 2020. " you are grounded!": Latent name arti- facts in pre-trained language models. arXiv preprint arXiv:2004.03012.  ... 
doi:10.18653/v1/2020.emnlp-main.302 fatcat:d4b6eedmzjd4zizhd7qwizbjue