46 Hits in 11.6 sec

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods [article]

Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, Himabindu Lakkaraju
2020 arXiv   pre-print
In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable.  ...  Using extensive evaluation with multiple real-world datasets (including COMPAS), we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques  ...  ACKNOWLEDGEMENTS We would like to thank the anonymous reviewers for their feedback, and Scott Lundberg for insightful discussions.  ... 
arXiv:1911.02508v2 fatcat:ybh7s6qyvjhuje6zzwftci2dpu

On the Tractability of SHAP Explanations [article]

Guy Van den Broeck, Anton Lykov, Maximilian Schleich, Dan Suciu
2021 arXiv   pre-print
Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently.  ...  First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model.  ...  The authors would like to thank YooJung Choi for valuable discussions on the proof of Theorem 5.  ... 
arXiv:2009.08634v2 fatcat:qrqyevl2dzhhhgvwj4uobvvh4y

Feature Attributions and Counterfactual Explanations Can Be Manipulated [article]

Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju
2021 arXiv   pre-print
We demonstrate how adversaries can design biased models that manipulate model agnostic feature attribution methods (e.g., LIME & SHAP) and counterfactual explanations that hill-climb during the counterfactual  ...  We evaluate the manipulations on real world data sets, including COMPAS and Communities & Crime, and find explanations can be manipulated in practice.  ...  fooling LIME or SHAP into generating innocuous explanations.  ... 
arXiv:2106.12563v2 fatcat:6eidicjv2vaxdb6f6vftjscp64

Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations [article]

Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach, Himabindu Lakkaraju
2022 arXiv   pre-print
In addition, we also observe that certain post hoc explanation methods (e.g., Integrated Gradients, SHAP) are more likely to exhibit the aforementioned disparities.  ...  To this end, we first outline the key properties which constitute explanation quality and where disparities can be particularly problematic.  ...  [30] and Slack et al. [66] demonstrated that methods such as LIME and SHAP may result in explanations that are not only inconsistent and unstable, but also prone to adversarial attacks.  ... 
arXiv:2205.07277v1 fatcat:3logqufk2fdqxnj37jcaaviyv4

Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey [article]

Arun Das, Paul Rad
2020 arXiv   pre-print
We start by proposing a taxonomy and categorizing the XAI techniques based on their scope of explanations, methodology behind the algorithms, and explanation level or usage which helps build trustworthy  ...  After explaining each category of algorithms and approaches in detail, we then evaluate the explanation maps generated by eight XAI algorithms on image data, discuss the limitations of this approach, and  ...  Shapley sampling methods [64] are also post-hoc and model agnostic.  ... 
arXiv:2006.11371v2 fatcat:6eaz3rbaenflxchjdynmvwlc4i

Towards Explainable Evaluation Metrics for Natural Language Generation [article]

Christoph Leiter and Piyawat Lertvittayakumjorn and Marina Fomicheva and Wei Zhao and Yang Gao and Steffen Eger
2022 arXiv   pre-print
We hope that our work can help catalyze and guide future research on explainable evaluation metrics and, mediately, also contribute to better and more transparent text generation systems.  ...  We also provide a synthesizing overview over recent approaches for explainable machine translation metrics and discuss how they relate to those goals and properties.  ...  Methods for extracting explanations in this case are called post-hoc explanation methods.  ... 
arXiv:2203.11131v1 fatcat:lcfy3vs445btdd4am3sakroek4

Explainable Artificial Intelligence Approaches: A Survey [article]

Sheikh Rabiul Islam, William Eberle, Sheikh Khaled Ghafoor, Mohiuddin Ahmed
2021 arXiv   pre-print
Practitioners can use this work as a catalog to understand, compare, and correlate competitive advantages of popular XAI methods.  ...  While many popular Explainable Artificial Intelligence (XAI) methods or approaches are available to facilitate a human-friendly explanation of the decision, each has its own merits and demerits, with a  ...  ACKNOWLEDGMENTS Our sincere thanks to Christoph Molnar for his open Ebook on Interpretable Machine Learning and contribution to the open-source R package "iml".  ... 
arXiv:2101.09429v1 fatcat:emnotqoj3zhs3lemwz7kbi45um

Fooling Partial Dependence via Data Poisoning [article]

Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek
2021 arXiv   pre-print
Many methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability.  ...  It turns out that such explanations are not robust nor trustworthy, and they can be fooled.  ...  Acknowledgments and Disclosure of Funding We would like to thank the anonymous reviewers for many insightful comments and suggestions.  ... 
arXiv:2105.12837v2 fatcat:c2ndcqmed5fe5djg2vi5fo7bcq

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective [article]

Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju
2022 arXiv   pre-print
We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and eight different predictive models, to measure  ...  As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations  ...  [49] demonstrated that methods such as LIME and SHAP may result in explanations that are not only inconsistent and unstable, but also prone to adversarial attacks and fair washing [8] .  ... 
arXiv:2202.01602v3 fatcat:4xwkf6gxn5axtc5om4hvdli4na

Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution View [article]

Di Jin and Elena Sergeeva and Wei-Hung Weng and Geeticka Chauhan and Peter Szolovits
2021 arXiv   pre-print
Moreover, we discuss how these methods, originally developed for solving general-domain problems, have been adapted and applied to healthcare problems and how they can help physicians better understand  ...  Besides the methods' details, we also include a discussion of advantages and disadvantages of these methods and which scenarios each of them is suitable for, so that interested readers can know how to  ...  They show how networks trained on medical imaging datasets can be used to fool ImageNet based classifiers.  ... 
arXiv:2112.02625v1 fatcat:omcm44vj2ffthcpna27typyvau

Explainable Deep Learning: A Field Guide for the Uninitiated [article]

Gabrielle Ras, Ning Xie, Marcel van Gerven, Derek Doran
2021 arXiv   pre-print
) places explainability in the context of other related deep learning research areas, and iv) finally elaborates on user-oriented explanation designing and potential future directions on explainable deep  ...  The development of methods and studies enabling the explanation of a DNN's decisions has thus blossomed into an active, broad area of research.  ...  We would also like to thank Erdi Çallı and Pim Haselager for the helpful discussions and general support.  ... 
arXiv:2004.14545v2 fatcat:4qvtfw6unbfgpkqmeosq737ghq

Explainable Deep Learning: A Field Guide for the Uninitiated

Gabrielle Ras, Ning Xie, Marcel Van Gerven, Derek Doran
2022 The Journal of Artificial Intelligence Research  
We hope the guide is seen as a starting point for those embarking on this research field.  ...  The development of methods and studies enabling the explanation of a DNN's decisions has thus blossomed into an active and broad area of research.  ...  Adversarial attack methods are about generating adversarial examples that can fool a DNN.  ... 
doi:10.1613/jair.1.13200 fatcat:qylru2n7tbepljxi72qah62bzy

Sentence-Based Model Agnostic NLP Interpretability [article]

Yves Rychener, Xavier Renard, Djamé Seddah, Pascal Frossard, Marcin Detyniecki
2020 arXiv   pre-print
Today, interpretability of Black-Box Natural Language Processing (NLP) models based on surrogates, like LIME or SHAP, uses word-based sampling to build the explanations.  ...  sampling, eventually leading to non founded explanations.  ...  Compared to other, word-based black-box post-hoc NLP interpretability methods like LIME (Ribeiro et al., 2016) and SHAP (Lundberg and Lee, 2017), we have a much smaller search space (Section 2.2).  ... 
arXiv:2012.13189v2 fatcat:p3auhaugare7blxmcqtbdblmbi

Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey [article]

Vanessa Buhrmester, David Münch, Michael Arens
2019 arXiv   pre-print
We work out the drawbacks and gaps and summarize further research ideas.  ...  Through their increased distribution, decision-making algorithms can contribute promoting prejudge and unfairness which is not easy to notice due to lack of transparency.  ...  They showed, for instance, how easy one could fool object detectors with small changes in the input image or created adversarial examples to make them collapse, see [4] , [5] .  ... 
arXiv:1911.12116v1 fatcat:qgeg6rz6qzgrfikhsgah77yz2a

Do Gradient-based Explanations Tell Anything About Adversarial Robustness to Android Malware? [article]

Marco Melis, Michele Scalas, Ambra Demontis, Davide Maiorca, Battista Biggio, Giorgio Giacinto, Fabio Roli
2021 arXiv   pre-print
Our experiments conducted on two different datasets and five classification algorithms for Android malware detection show that a strong connection exists between the uniformity of explanations and adversarial  ...  In this work, we investigate whether gradient-based attribution methods, used to explain classifiers' decisions by identifying the most relevant features, can be used to help identify and select more robust  ...  Data availability The datasets generated during and/or analysed during the current study are available in the Androzoo repository,, and upon request at  ... 
arXiv:2005.01452v2 fatcat:hgbpr63czfcuzmi23u6jex5huq
« Previous Showing results 1 — 15 out of 46 results