Carnegie Mellon University
Browse

Understanding and Enhancing AI Model Robustness and Reliability Through Input Perturbation

Download (4.91 MB)
thesis
posted on 2025-05-29, 19:34 authored by Chi ZhangChi Zhang

AI models, including machine learning models and large language models (LLMs), have demonstrated remarkable capabilities across tasks such as image recognition, coding tasks, and natural language processing (NLP). However, these models are vulnerable to input perturbations—small yet meaningful modifications to input data—which can compromise their performance. Such perturbations fall into two broad categories: natural perturbations (e.g., noise or typos) and adversarial perturbations (intentionally crafted by malicious actors). Despite their minimal changes to the input’s semantics, both types can cause severe model misjudgments. At the same time, these perturbations offer a valuable lens through which we can systematically study and improve model reliability and robustness. The central goal of this thesis is to investigate the effects of input perturbations, develop effective defenses, and leverage them as tools to improve the reliability of AI models. These findings could enhance the robustness and reliability of AI models across domains such as image recognition, natural language processing, and various coding tasks. To this end, the research focuses on two complementary directions: defending against perturbations to ensure robustness, and using them as a diagnostic tool to assess model reliability.

First, the thesis builds on recent advances in adversarial attacks and defenses to explore robustness failures in AI systems across domains. It examines how perturbations undermine certifiably robust neural networks and cascading classifiers, and how LLMs are affected by both natural and adversarial perturba tions. The thesis also proposes a range of defense strategies, including a prompt-based method tailored to LLMs that offers a scalable and cost-effective way to mitigate adversarial inputs without requiring expen sive retraining. These contributions aim to support the broader goal of building more robust and reliable AI systems.

Second, this thesis goes beyond the defense to introduce perturbation-based methods for evaluating internal model reliability. It proposes a method for delicately selecting few-shot examples for LLMs in the context of code vulnerability detection, aiming to improve model accuracy by choosing examples that are relevant and informative. In addition, it presents an approach to estimate the correctness of retrieval augmented generation (RAG) outputs by analyzing model uncertainty under perturbed inputs—without requiring ground truth. Together, these contributions offer new ways to enhance both the performance and trustworthiness of AI models in real-world scenarios.

History

Date

2025-05-02

Degree Type

  • Dissertation

Thesis Department

  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Limin Jia Corina Pasareanu

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC