Generation of low distortion adversarial attacks via convex programming
Abstract
As deep neural networks (DNNs) achieve extraordinary performance in a wide range of tasks, testing their robustness under adversarial attacks becomes paramount. Adversarial attacks, also known as adversarial examples, are used to measure the robustness of DNNs and are generated by incorporating imperceptible perturbations into the input data with the intention of altering a DNN's classification. In prior work in this area, most of the proposed optimization based methods employ gradient descent to find adversarial examples. In this paper, we present an innovative method which generates adversarial examples via convex programming. Our experiment results demonstrate that we can generate adversarial examples with lower distortion and higher transferability than the C&W attack, which is the current state-of-the-art adversarial attack method for DNNs. We achieve 100% attack success rate on both the original undefended models and the adversarially-trained models. Our distortions of the L-inf attack are respectively 31% and 18% lower than the C&W attack for the best case and average case on the CIFAR-10 data set.