A causal framework for explaining the predictions of black-box sequence-to-sequence models

David Alvarez-Melis, Tommi Jaakkola
2017 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing  
We interpret the predictions of any blackbox structured input-structured output model around a specific input-output pair. Our method returns an "explanation" consisting of groups of input-output tokens that are causally related. These dependencies are inferred by querying the black-box model with perturbed inputs, generating a graph over tokens from the responses, and solving a partitioning problem to select the most relevant components. We focus the general approach on sequence-tosequence
more » ... lems, adopting a variational autoencoder to yield meaningful input perturbations. We test our method across several NLP sequence generation tasks.
doi:10.18653/v1/d17-1042 dblp:conf/emnlp/Alvarez-MelisJ17 fatcat:j3yi5abponhpzn6shjnmdknf5a