Improving the efficiency of adversarial robustness defenses
There’s still an immense gap between the capabilities of current, narrow AI and the intellectual prowess of humans. There is one thing, though, that human and artificial intelligence have in common: With a little manipulation, both can be misled to draw wrong conclusions from data. But how easy is it to fool an AI model? Adversarial machine learning is the field of research that specifically poses and tries to address this question.
There’s no shortage of well-intended doomsayers in the field. They warn us of the need to protect our models from both internal and external manipulation. Researchers have shown how they can trick simple AI models into mistaking pandas for gibbons and stop signs for speed limit signs. But the threats don’t stop there. What if the same types of attacks could convince industrial-scale models to mistake malware for approved software or post malicious social media messages with the intent of, say, sending a particular company’s stock tanking? The consequences of such attacks could be disastrous. We need to protect AI models from them.
But it’s not so easy. Year after year, papers are published promising protection against adversaries trying to manipulate machine learning models, but they’re often found to have improperly evaluated their defense scheme. And once attackers have figured out the defense strategy, they can often circumvent it with some cleverness.
The many flawed defense strategies proposed over the years have encouraged the study of certifiable defenses — meaning defenses that will always work, irrespective of the adversarial attack strategy. Such defenses provide a theoretical guarantee of performance under certain conditions. This is perfect for practical uses as we are unlikely to know all the different ways in which a model will be attacked. The simplest of these defenses is randomized smoothing.
First, you must train a model to be able to recognize a noisy version of the training data. Then, when you need to classify an input, you create multiple noisy copies of that input and classify all those copies at once. Finally, you perform a majority vote to get the final classification. Unlike a tradition classifier, this smooth classifier is harder for the adversary to manipulate. The intuition is that although finding a single adversarial input might be easy, finding one that remains adversarial when subjected to randomized smoothing is hard. Often, the robustness of a certified model is represented by its performance on a validation dataset with respect to pre-specified noise regions.
When it was first proposed, the randomized smoothing framework trained the underlying model by adding Gaussian noise to the training inputs. This noising process is inexpensive since it’s just a simple addition to the training inputs, but it doesn’t produce the most robust model. The smooth classifier can only tolerate a small amount of noise before performance degrades. Since its inception, we have seen numerous modifications to the training pipeline, such as using adversarial training techniques, in order to improve model robustness. The issue is that these proposed modifications tend to be expensive. For example, we found that using adversarial training with randomized smoothing increased the training overhead by about five times in the best case compared to using Gaussian noise.
In a research scenario, where a model might only be trained a few times to highlight the proposed idea in a paper, these overheads aren’t a big deal. At worst, the authors just need to spend more time training the model and might pay some extra money to do so. The problem comes when we consider real-world deployments. It’s hard to motivate someone to guarantee the security of their model when the price they must pay is a fivefold increase in training time. This becomes even harder when you tell them that they need to guarantee security every time they deploy a new version of the model.
We realized that most well-known adversarial robustness techniques focus on how to generate a secure AI model, but not on how to maintain it. We expect that, due to new model architectures and data drift, models deployed in practical settings will eventually be updated. From our understanding, prior work ignored this scenario, especially with regards to certifiable robustness. So to ensure secure AI models, certifiable robustness training would need to be repeated on each deployment.
Ideally, the expensive training process to secure an AI model should be performed as few times as possible. After which, we can use existing robust models to secure future model generations. In our NeurIPS 2022 paper, written jointly with collaborators from Stony Brook University, we have provided a solution for all existing and future certified training methods. Knowledge transfer is a student-teacher training framework in which information learned by a teacher model is transferred to a student model. Traditionally, this training framework was used to improve model performance in non-adversarial scenarios. We found that knowledge transfer, though, when combined with randomized smoothing, enables the transfer of certifiable robustness.
First, you train a certifiably robust model using an expensive certified training approach (such as SmoothMix). This is your teacher model. Then, when training a new, robust model, rather than redo an expensive certified training approach, you can instead re-use the teacher model using a knowledge transfer framework. Specifically, our knowledge transfer framework seeks to minimize the distance between the student and teacher models’ outputs on Gaussian noisy training samples:
Compared to the original implementation of randomized smoothing, our loss objective only requires that we query the teacher’s output on the Gaussian noisy training sample. We found that using the robust teacher’s outputs to guide student training was sufficient to generate an accurate and robust student model. Querying the teacher during training is fast as it only involves a single forward pass through the teacher model. As such, our transfer learning framework is as fast as a non-robust training pipeline. Once a strong, robust model has been trained, our transfer learning framework allows for repeated generation of secure models without any additional training overhead.
In the paper, we demonstrate how our certified transfer learning approach mitigated the training overhead of SmoothMix, a state-of-the-art randomized smoothing defense, while maintaining the security of future model generations. Our transfer learning framework also remained effective across several generations, despite only performing robust training once. We also found that our framework can be used to accelerate certified training even when no robust model is available. We can just train a smaller, fast-to-train model with expensive certifiable training methods first and then apply our transfer learning framework.
Achieving high accuracy in adversarial scenarios is a very desirable goal, but we can’t forget about the other costs that go into training — especially time. For AI robustness, the research community has often focused on improving accuracy metrics at the cost of increasing training overheads. That’s fine for a paper — but for practical use, we need more. Our knowledge transfer framework is a response to a lack of effective and practical adversarial robustness defenses for industrial use cases. If we want to encourage model developers to create trustworthy AI models, we should focus on making each component, including adversarial robustness, easy to adopt.