Provable, Structured, and Efficient Methods for Robustness of Deep Networks to Adversarial Examples

Wong, Eric

doi:10.1184/R1/13607570.v1

ericwong_MachineLearning_2020.pdf (3.34 MB)

Provable, Structured, and Efficient Methods for Robustness of Deep Networks to Adversarial Examples

thesis

posted on 2021-01-25, 22:31 authored by Eric WongEric Wong

While deep networks have contributed to major leaps in raw performance across various applications, they are also known to be quite brittle to targeted data perturbations.

By adding a small amount of adversarial noise to the data, it is possible to drastically change the output of a deep network. The existence of these so-called adversarial examples, perturbed data points which fool the model, pose a serious risk for safety- and security-centric applications where reliability and robustness are critical. In this dissertation, we present and analyze a number of approaches for mitigating the effect of adversarial examples, also known as adversarial defenses. These defenses can offer varying degrees and types of robustness, and in this dissertation we study defenses which differ in the strength of the the robustness guarantee, the efficiency and simplicity of the defense, and the type of perturbation being defended

against. We start with the strongest type of guarantee called provable adversarial defenses, showing that is possible to compute duality-based certificates that guarantee no adversarial examples exist within an `p-bounded region, which are trainable and can be minimized to learn networks which are provably robust to adversarial attacks. The approach is agnostic to the specific architecture and is applicable to arbitrary computational graphs, scaling to medium sized convolutional networks with random projections. We then switch gears to developing a deeper understanding of a more empirical defense known as adversarial training. Although adversarial training does not come with formal guarantees, it can learn networks more efficiently and with better empirical performance against attacks. We study the optimization process and reveal

several intriguing properties of the robust learning problem, finding that a simple modification to one of the earliest adversarial attacks can be sufficient to learn networks

robust to much stronger attacks, as well as finding that adversarial training as a general procedure is highly susceptible to overfitting. These discoveries have significant

implications on both the efficiency of adversarial training as well as the state of the field: for example, virtually all recent algorithmic improvements in adversarial training can be matched by simply using early stopping. The final component of this dissertation expands the realm of adversarial examples beyond `p-norm bounded perturbations, to enable more realistic threat models

for applications beyond imperceptible noise. We define a threat model called the Wasserstein adversarial example, which captures semantically meaningful image

transformations like translations and rotations previously uncaptured by existing threat models. We present an efficient algorithm for projecting onto Wasserstein

balls, enabling both generation of and adversarial training against Wasserstein adversarial examples. Finally, we demonstrate how to generalize adversarial training

to defend against multiple types of threats simultaneously, improving upon naive aggregations of adversarial attacks.

History

Date

2020-05-15

Degree Type

Dissertation

Department

Machine Learning

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

J. Zico Kolter

Usage metrics

Keywords

adversarial examples provable defenses adversarial training deep networks

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Provable, Structured, and Efficient Methods for Robustness of Deep Networks to Adversarial Examples

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports