Carnegie Mellon University
Browse

Trustworthy Scientific Inference with Machine Learning

Download (38.43 MB)
thesis
posted on 2025-06-03, 17:37 authored by Luca MasseranoLuca Masserano

The application of AI and machine learning to complex scientific problems is becoming increasingly widespread across various fields. A key challenge of scientific inference is to derive parameter constraints that are both valid — meaning they include the true parameter regardless of its (unknown) value at a specified confidence level, even in finite samples — and precise — meaning they are as small as possible given the data-generating process. However, standard machine learning approaches often fail to ensure that these properties hold, thereby limiting the reliability of downstream scientific conclusions. In this dissertation, we introduce several novel techniques to leverage regression, classification, and generative models to construct confidence sets with strong statistical guarantees. The methods we develop allow one to derive confidence sets that are simultaneously (1) valid across the entire parameter space and in finite samples, (2) robust to prior probability shifts, (3) as precise as possible when prior knowledge aligns with the target distribution, and (4) computationally efficient. By bridging modern machine learning with classical statistical tools, we provide a principled path towards integrating AI into scientific inference and discovery pipelines, enabling advancements in fields such as astronomy, high-energy physics, biology, and beyond.

History

Date

2025-05-01

Degree Type

  • Dissertation

Department

  • Statistics and Data Science

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Ann B. Lee Barnabas Poczos

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC