[Re] Reproducibility study - Does enforcing diversity in hidden states of LSTM-Attention models improve transparency?

Pieter Bouwman, Yun Li, Rogier Van Der Weerd, Frank Verhoef
2021 Zenodo  
It has been shown [1] that the weights in attention mechanisms do not necessarily offer a faithful explanation of the model's predictions. In the paper Towards Transparent and Explainable Attention Models Mohankumar et al. 2 propose two methods to enhance faithfulness and plausibility of the explanations provided by an LSTM model combined with a basic attention mechanism. Scope of Reproducibility -For this reproducibility study, we focus on the main claims made in this paper: • The attention
more » ... ghts in standard LSTM attention models do not provide faithful and plausible explanations for its predictions. This is potentially because the conicity of the LSTM hidden vectors is high. • Two methods can be applied to reduce conicity: Orthogonalization and Diversity Driven Training. When applying these methods, the resulting attention weights offer more faithful and plausible explanations of the model's predictions, without sacrificing model performance. Methodology -The paper includes a link to a repository with the code used to generate its results. We follow four investigative routes: (i) Replication: we rerun experiments on datasets from the paper in order to replicate the results, and add the results that are missing in the paper; (ii) Code review: we scrutinize the code to validate its correctness; (iii) Evaluation methodology: we extend the set of evaluation metrics used in the paper with the LIME method, in an attempt to resolve inconclusive results; (iv) Generalization to other architectures: we test whether the authors' claims apply to variations of the base model (more complex forms of attention and a BiLSTM encoder). Results -We confirm that the Orthogonal and Diversity LSTM achieve similar accuracies as the Vanilla LSTM, while lowering conicity. However, we cannot reproduce the results of several of the experiments in the paper that underlie their claim of better transparency. In addition, a close inspection of the code base reveals some potentially problematic inconsistencies. Despite this, under certain conditions, we do confirm that the
doi:10.5281/zenodo.4835592 fatcat:27tl63w3drcinhhkieqd4ra36q