A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
"You are grounded!": Latent Name Artifacts in Pre-trained Language Models
[article]
2020
arXiv
pre-print
Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. ...
As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias. ...
In this work we focus on the representations of given names in pre-trained LMs (Table 1) . ...
arXiv:2004.03012v2
fatcat:fw4gsqbffvbbxenarrdy6xtkyy
"You are grounded!": Latent Name Artifacts in Pre-trained Language Models
2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
unpublished
Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. ...
As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias. ...
In this work we focus on the representations of given names in pre-trained LMs (Table 1) . ...
doi:10.18653/v1/2020.emnlp-main.556
fatcat:sxzrrxpsjnax5frgrfisoeytrm
What do Bias Measures Measure?
[article]
2021
arXiv
pre-print
Natural Language Processing (NLP) models propagate social biases about protected attributes such as gender, race, and nationality. ...
To address this gap, this work presents a comprehensive survey of existing bias measures in NLP as a function of the associated NLP tasks, metrics, datasets, and social biases and corresponding harms. ...
"you are grounded!": Latent name artifacts in pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6850-6861. ...
arXiv:2108.03362v1
fatcat:migucaqmgne3hiyirl755o4dpu
Exposing Shallow Heuristics of Relation Extraction Models with Challenge Data
2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
unpublished
We identify failure modes of SOTA relation extraction (RE) models trained on TACRED, which we attribute to limitations in the data annotation process. ...
The process of collecting and annotating training data may introduce distribution artifacts which may limit the ability of models to learn correct generalization behavior. ...
Vered Shwartz, Rachel Rudinger, and Oyvind Tafjord.
2020. " you are grounded!": Latent name arti-
facts in pre-trained language models. arXiv preprint
arXiv:2004.03012. ...
doi:10.18653/v1/2020.emnlp-main.302
fatcat:d4b6eedmzjd4zizhd7qwizbjue