An Assessment of Robustness for Adversarial Attacks and Physical Distortions on Image Classification using Explainable AI

K. T. Yasas Mahima, Mohamed Ayoob, Guhanathan Poravi
2021 SGAI Conferences  
Introducing defence mechanisms to overcome the vulnerability of adversarial attacks is a highly focused research area. However recent research highlights that introducing defence approaches for man-made adversarial attacks is not sufficient, because the deep learning models are vulnerable to the perturbations outside the scope of the training set and the physical world itself acts as an adversarial sample generator. Given this caveat, there is a necessity to introduce general defence approaches
more » ... for both man-made and physical world adversarial samples. Prior to that, a brief explanation of how the model's decision-making process happens in the inference phase under the various adversarial perturbations is required. However, the deep learning models act as black boxes in the inference phase where the decision-making is not interpretable. As a result, research on model interpretability and explainability has been carried out in the domain which is collectively known as Explainable AI. Using a set of Explainable AI techniques, this study is investigating the deep learning networks' robustness; i.e., the decision-making process in neural networks and important pixel attributes for the predictions that are captured when the deep learning model inference gets adversarial inputs. These adversarial inputs are perturbed by adversarial attacks or the physical world adversaries using the deep learning network trained on the CIFAR10 dataset. The study reveals, that when the inference gets adversarial samples, the necessary pixel attributes for the prediction captured by the network vary everywhere in the image. However, when the network is re-trained using adversarial training or data transformation-based augmentation, it will be able to capture pixel attributes within the particular object or reduce the capture of negative pixel attributes. Based on the deductions gained from the findings, this paper states some potential research approaches to introduce a general adversarial defence method.
dblp:conf/sgai/MahimaAP21 fatcat:rdwwpu3eufcujarr4m7uh2a7ty