Geometry of adversarial robustness of deep networks: methods and applications
Seyed Mohsen Moosavi Dezfooli
We are witnessing a rise in the popularity of using artificial neural networks in many fields of science and technology. Deep neural networks in particular have shown impressive classification performance on a number of challenging benchmarks, generally in well controlled settings. However it is equally important that these classifiers satisfy robustness guarantees when they are deployed in uncontrolled (noise-prone) and possibly hostile environments. In other words, small perturbations applied
... to the samples should not yield significant loss to the performance of the classifier. Unfortunately, deep neural network classifiers are shown to be intriguingly vulnerable to perturbations and it is relatively easy to design noise that can change the estimated label of the classifier. The study of this high-dimensional phenomenon is a challenging task, and requires the development of new algorithmic tools, as well as theoretical and experimental analysis in order to identify the key factors driving the robustness properties of deep networks. This is exactly the focus of this PhD thesis. First, we propose a computationally efficient yet accurate method to generate minimal perturbations that fool deep neural networks. It permits to reliably quantify the robustness of classifiers and compare different architectures. We further propose a systematic algorithm for computing universal (image-agnostic) and very small perturbation vectors that cause natural images to be misclassified with high probability. The vulnerability to universal perturbations is particularly important in security-critical applications of deep neural networks, and our algorithm shows that these systems are quite vulnerable to noise that is designed with only limited knowledge about test samples or classification architectures. Next, we study the geometry of the classifier's decision boundary in order to explain the adversarial vulnerability of deep networks. Specifically, we establish precise theoretical bounds on the robustness of classifiers in a novel semi-random noise regime that generalizes both the adversarial and the random perturbation regimes. We show in particular that the robustness of deep networks to universal perturbations is driven by a key property of the curvature of their decision boundaries. Our analysis therefore suggests ways to improve the robustness properties of these classifiers to adversarial perturbations. Finally, we build on the geometric insights derived in this thesis in order to improve the robustness properties of state-of-the-art image classifiers. We leverage a fundamental property in the curvature of the decision boundary of deep networks, and propose a method to detect small adversarial perturbations in images, and to recover the labels vii Abstract of perturbed images. To achieve inherently robust classifiers, we further propose an alternative to the common adversarial training strategy, where we directly minimize the curvature of the classifier. This leads to adversarial robustness that is on par with adversarial training. Our proposed regularizer is thus an important step towards designing robust image classification systems. In summary, we demonstrate in this thesis a new geometric approach to the problem of the adversarial vulnerability of deep networks, and provide novel quantitative and qualitative results that precisely describe the behavior of classifiers in adversarial settings. Our results in this thesis contribute to the understanding of the fundamental properties of state-of-the-art image classifiers that eventually will bring important benefits in safetycritical applications such as in self-driving cars, autonomous robots, and medical imaging.