Trustworthy Scientific Inference with Machine Learning
The application of AI and machine learning to complex scientific problems is becoming increasingly widespread across various fields. A key challenge of scientific inference is to derive parameter constraints that are both valid — meaning they include the true parameter regardless of its (unknown) value at a specified confidence level, even in finite samples — and precise — meaning they are as small as possible given the data-generating process. However, standard machine learning approaches often fail to ensure that these properties hold, thereby limiting the reliability of downstream scientific conclusions. In this dissertation, we introduce several novel techniques to leverage regression, classification, and generative models to construct confidence sets with strong statistical guarantees. The methods we develop allow one to derive confidence sets that are simultaneously (1) valid across the entire parameter space and in finite samples, (2) robust to prior probability shifts, (3) as precise as possible when prior knowledge aligns with the target distribution, and (4) computationally efficient. By bridging modern machine learning with classical statistical tools, we provide a principled path towards integrating AI into scientific inference and discovery pipelines, enabling advancements in fields such as astronomy, high-energy physics, biology, and beyond.
History
Date
2025-05-01Degree Type
- Dissertation
Department
- Statistics and Data Science
Degree Name
- Doctor of Philosophy (PhD)