Calibrated Conditional Density Models and Predictive Inference via Local Diagnostics
Conditional densities, f(y|x), are integral to uncertainty quantification when predicting a target y from covariates x, but they are challenging to estimate well. It is therefore difficult to ensure that (1 − α)-level prediction sets for y constructed from a conditional density model have the correct conditional coverage; that is, they contain the observed y with probability (1 − α) at all locations x in feature space. We investigate how we can, with access to an observed “ground truth” sample of (x, y), develop diagnostics that specify how an estimated conditional density errs from the true density, at any location x. With this detailed insight into quality of fit at any x, we are then naturally able to develop a calibration method, to correct an initial estimated conditional density using a ground truth data sample. This method produces calibrated densities for f(y|x) that are approximately accurate across all locations x, yielding calibrated prediction sets with accurate conditional coverage.
In the first part of this thesis, we present practical procedures for identifying, localizing, and interpreting the nature of (statistically significant) discrepancies between an approximated and true conditional density, over the entire feature space. Our flexible framework is more discerning than previous diagnostics, in that we can distinguish an arbitrarily misspecified model from the true conditional density of an observed sample. We also provide ”Amortized Local P-P plots” (ALP), which are interpretable graphical summaries of distributional differences at any location in the feature space. In the second part of this thesis, we leverage this diagnostic framework to correct misspecified conditional density models, from which we can then construct calibrated prediction sets that have desired conditional coverage. Because our diagnostics directly specify where in feature space and how an estimated and true conditional CDF may differ, we can use this information to directly correct the model towards the target conditional coverage. We explore an application to the real-world astrophysical problem of photometric redshift (“photo-z”) prediction, where conditional density models are difficult to estimate and conditional coverage is of practical significance. In the third part of this thesis, we explore an extension of this calibration method that hybridizes it with local conformal inference, allowing it to achieve finite-sample marginal and local validity at the expense of some precision.
- Statistics and Data Science
- Doctor of Philosophy (PhD)