Neural Perceptual Adversarial Robustness: Generalizable Defenses to Unforeseen Attacks

Abstract

In the last couple of years, a lot of progress has been made to enhance robustness of models against adversarial attacks. Current defenses, however, are developed for a particular attack (e.g. L2) and often have poor generalization to ‚Äúunforeseen‚Äù attack threat models (the ones not used in training).

In this talk, I will present our recent results to tackle this issue. We present adversarial attacks and defenses for the perceptual adversarial threat model: the set of all perturbations to natural images which can mislead a classifier but are imperceptible to human eyes. The perceptual threat model is broad and encompasses L2, L_\infty, spatial, and many other existing adversarial threat models. However, it is difficult to determine if an arbitrary perturbation is imperceptible without humans in the loop. To solve this issue, we propose to use a ‚Äúneural perceptual distance‚Äù, an approximation of the true perceptual distance between images using neural networks. We then propose the ‚Äúneural perceptual threat model‚Äù that includes adversarial examples with a bounded neural perceptual distance to natural images. Remarkably, we find that the defense against the neural perceptual threat model generalizes well against many types of unseen Lp and non-Lp adversarial attacks.

This is a joint work with Cassidy Laidlaw and Sahil Singla.