deep-scanner.png

Deep Scanner

A tale of adversarial attacks & out-of-distribution detection stories in the activation space

Overview

Most deep learning models require ideal conditions and rely on the assumption that test and production data comes from the in-distribution samples from the training data. However, most real world examples don’t follow this pattern. Test data can differ from the training data because of adversarial perturbations, new classes, generated content, noise, or other distribution changes. These shifts in the input data can lead to classifying unknown types — classes that do not appear during training — as known with high confidence. Additionally, adversarial perturbations in input data can cause a sample to be incorrectly classified. In this project, we discuss approaches using group-based and individual-subset scanning methods from the anomalous pattern detection domain and how they can be applied to off-the-shelf deep learning models.

IJCAI2020.png
IJCAI2022.png
IJCAI2022_2.jpeg
deep-scanner-2025.jpeg
These upset and venn plots show the overlap of activations across intra-persona topics. This suggests that persona dimensions are more distinctly localized, while ethical statements are more polysemous.