Towards Generalizable Robustness of Deep Learning Models
While Deep Neural Networks (DNNs) have advanced the state-of-the-art in machine perception by leaps and bounds, their sensitivity to subtle input perturbations that humans are invariant to has raised questions about their reliability in real-world settings. Perhaps the most pernicious and alarming of these perturbations are adversarial perturbations, which can drastically and arbitrarily change the outputs of DNNs, while remaining imperceptible (or barely perceptible) to humans. Algorithms for generating adversarial perturbations are known as adversarial attacks.
Existing methods of making DNNs robust to adversarial perturbations, generally require training on adversarially perturbed or noisy data. While this approach successfully produces DNNs robust to the adversarial attacks used to generate perturbations during training, it does not generalize to other, unseen, types of attacks. Consequently, to obtain models that exhibit a more generalized robustness to a variety of adversarial attacks one would need to ensure that all such attacks are sufficiently represented in the training data. This objective is highly inefficient at best, or impossible, at worst, given that adversarial attacks are constantly evolving and the boundaries of human perception are not fully known.
Given the pitfalls of seeking robustness via training, in this thesis, we work towards models that are naturally more robust to a variety of adversarial attacks without having been trained on perturbed data. To this end, we seek to discover principles, or priors, for DNNs that endow them with enhanced robustness to adversarial perturbations. As these priors induce adversarial robustness without requiring training on perturbed data, we expect them to yield models robust to various perturbations and attack algorithms.
Concretely, we study two categories of robustness priors in this thesis: structure and biological. We define structural robustness priors as design elements of DNNs that are conducive to adversarial robustness. Biological priors, on the other hand, are mechanisms and constraints related to the robustness of biological perception and cognition but are not usually represented in DNNs. Since adversarial perturbations are rooted in the difference between biological perception and DNNs, we expect that integrating biological priors into DNNs would better align their behavior with biological perception and consequently cause them to exhibit robustness to adversarial perturbations, and perhaps even various other noises that biological perception is robust to.
We approach the study of structural robustness priors from two directions, namely statistical, and empirical. In the former, we take the view that by virtue of being highly overparameterized modern DNNs may encode spurious features, and show that pruning away neurons that encode such spurious features improves robustness to adversarial attacks. In the empirical approach, we estimate the probability with which gradient descent, from a random initialization, arrives at a model that is both robust and accurate. Our experiments on simple problems, like XOR or MNIST, reveal that certain design elements increase the odds of finding robust models while others decrease these odds.
In our study of biological priors, we consider sensory and cognitive priors. Sensory priors relate to the constraints present in sensor organs that emphasize or de-emphasize certain aspects of the stimuli. In the domain of vision, one such prior is foveation due to which only the region around the fixation point is sensed at maximum fidelity. We integrate foveation in DNNs and demonstrate that it significantly improves their robustness to adversarial attacks, as well as non-adversarial perturbations. Similarly, examples biological priors in audition are simultaneous frequency masking and lateral suppression due to which the perceived level of a frequency is influenced by the levels of other adjacent frequencies. We integrate these phenomena into speech recognition DNNs and observe that their robustness to adversarial attacks, as well as other corruptions, is greatly enhanced while their accuracy is minimally impacted.
Cognitive priors, on the other hand, relate to the computations performed in the brain. In this connection, we have explored the role of inflexible inter-neuron correlations and shown that constraining the inter-neuron correlations makes DNNs more robust to adversarial and non-adversarial perturbations. We have also simulated feedback connections, that are ubiquitous in the brain, in DNNs and shown that doing so improves adversarial robustness.
To reliably evaluate the improvements we achieve and compare them with prior work, we need standardized robustness benchmarks. While such benchmarks have been developed for vision tasks, they do not exist for other modalities such as audio. To fill this gap, we have developed a comprehensive robustness benchmark for speech models called Speech Robust Bench (SRB). SRB is composed of 114 challenging speech recognition scenarios covering the range of corruptions that ASR models may encounter when deployed in the wild.
History
Date
2025-06-01Degree Type
- Dissertation
Thesis Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)