Understanding and Enhancing AI Model Robustness and Reliability Through Input Perturbation
AI models, including machine learning models and large language models (LLMs), have demonstrated remarkable capabilities across tasks such as image recognition, coding tasks, and natural language processing (NLP). However, these models are vulnerable to input perturbations—small yet meaningful modifications to input data—which can compromise their performance. Such perturbations fall into two broad categories: natural perturbations (e.g., noise or typos) and adversarial perturbations (intentionally crafted by malicious actors). Despite their minimal changes to the input’s semantics, both types can cause severe model misjudgments. At the same time, these perturbations offer a valuable lens through which we can systematically study and improve model reliability and robustness. The central goal of this thesis is to investigate the effects of input perturbations, develop effective defenses, and leverage them as tools to improve the reliability of AI models. These findings could enhance the robustness and reliability of AI models across domains such as image recognition, natural language processing, and various coding tasks. To this end, the research focuses on two complementary directions: defending against perturbations to ensure robustness, and using them as a diagnostic tool to assess model reliability.
First, the thesis builds on recent advances in adversarial attacks and defenses to explore robustness failures in AI systems across domains. It examines how perturbations undermine certifiably robust neural networks and cascading classifiers, and how LLMs are affected by both natural and adversarial perturba tions. The thesis also proposes a range of defense strategies, including a prompt-based method tailored to LLMs that offers a scalable and cost-effective way to mitigate adversarial inputs without requiring expen sive retraining. These contributions aim to support the broader goal of building more robust and reliable AI systems.
Second, this thesis goes beyond the defense to introduce perturbation-based methods for evaluating internal model reliability. It proposes a method for delicately selecting few-shot examples for LLMs in the context of code vulnerability detection, aiming to improve model accuracy by choosing examples that are relevant and informative. In addition, it presents an approach to estimate the correctness of retrieval augmented generation (RAG) outputs by analyzing model uncertainty under perturbed inputs—without requiring ground truth. Together, these contributions offer new ways to enhance both the performance and trustworthiness of AI models in real-world scenarios.
History
Date
2025-05-02Degree Type
- Dissertation
Thesis Department
- Electrical and Computer Engineering
Degree Name
- Doctor of Philosophy (PhD)