Poster: Characterizing adversarial subspaces by mutual information
Abstract
Deep learning is well-known for its great performances on images classification, object detection, and natural language processing. However, the recent research has demonstrated that visually indistinguishable images called adversarial examples can successfully fool neural networks by carefully crafting. In this paper, we design a detector named MID, calculating mutual information to characterize adversarial subspaces. Meanwhile, we use the defense framework called MagNet and mount the detector MID on it. Experimental results show that projected gradient descent (PGD), basic iterative method (BIM), Carlini and Wanger's attack (C&W attack) and elastic-net attack to deep neural network (elastic-net and L1 rules) can be effectively defended by our method.