On the Feature Alignment of Deep Vision Models: Explainability and Robustness Connected At Hip

Wang, Zifan

doi:10.1184/R1/24026334.v1

Wang_cmu_0041E_11063.pdf (25.05 MB)

On the Feature Alignment of Deep Vision Models: Explainability and Robustness Connected At Hip

thesis

posted on 2023-09-07, 20:55 authored by Zifan WangZifan Wang

Deep Neural Networks (DNNs) have recently demonstrated remarkable performance that is comparable to humans. However, these models pose a challenge when it comes to answering whether their behaviors, ethical values, and morality always align with humans’ interests. This issue is known as (mis)alignment of intelligent systems. One basic requirement for deep classifiers to be considered aligned is that their output is always semantically equivalent to that of a human, who possesses the necessary knowledge and tools to solve the problem at hand. Unfortunately, verifying the alignment between models and humans on outputs is often not feasible, as it would be impractical to test every sample from the distribution.

A lack of output alignment of DNNs has been evidenced by their vulnerability to adversarial noise, which are unlikely to affect a human’s response. This weakness originates from the fact that important features used by the model may not be semantically meaningful from a human perspective, an issue which we will term as feature (mis)alignment in vision tasks. Being (perceptually) aligned with humans on useful features is necessary to preserve output alignment. Thus, the goal of this thesis is to evaluate and enhance the feature alignment of deep vision classifiers to promote output alignment.

To evaluate feature alignment, we introduce locality, a metric based on explainability tools that guarantee faithful returns of important features contributing towards the models’ outputs. Consequently, the first contribution of the thesis shows that modern architectures, e.g., Vision Transformers (ViTs), the stateof-the-art classifiers on many tasks, are misaligned in features. Our second contribution, on the other hand, shows that improved adversarial robustness leads to improved locality. To be specific, we find that a robust model has better locality than any non-robust model and the locality of a model increases as it becomes more robust. Inspired by this finding, our third contribution is to improve robustness with a novel technique, TrH regularization, based on a direct minimization of PAC-Bayesian generalization bound for robustness. Our technique provides the new state-of-the-art robustness for ViTs. However, as robustness is often measured by running existing attacks, the guarantee is only empirical and may fail against adaptive attacks. The last contribution of this thesis introduces GloRo Nets, which entail a built-in formal robustness verification layer based on the global Lipschitz constant of the model. Unlike a probabilistic guarantee provided by Randomized Smoothing, GloRo Nets have a deterministic guarantee and significantly improve the state-of-the-art provable robustness under ℓ₂-norm-bounded threats.

Robustness is necessary for feature alignment but is probably not sufficient, as there are many other unspecified requirements that would result in misalignment. In conclusion, the thesis discusses the issue of under-specification in classification and its connection to alignment, together with potential remedies for addressing the issue as another step towards feature alignment in deep learning

History

Date

2023-08-15

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Anupam Datta, Matt Fredrikson

Usage metrics

Keywords

Adversarial Robustness Explainability Feature Alignment Machine Learning Vision Models

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

On the Feature Alignment of Deep Vision Models: Explainability and Robustness Connected At Hip

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports