A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions
2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
unpublished
Modern deep learning models for NLP are notoriously opaque. This has motivated the development of methods for interpreting such models, e.g., via gradient-based saliency maps or the visualization of attention weights. Such approaches aim to provide explanations for a particular model prediction by highlighting important words in the corresponding input text. While this might be useful for tasks where decisions are explicitly influenced by individual tokens in the input, we suspect that such
doi:10.18653/v1/2020.acl-main.492
fatcat:mjrpm4bfara7vnm7hueuv7j4qy