Comparing Forecasters and Abstaining Classifiers
This thesis concerns nonparametric statistical methods for comparing black-box predictors, namely sequential forecasters and abstaining classifiers.
In the first part of the thesis, we develop anytime-valid estimation and testing approaches for comparing probability forecasters on sequentially occurring events. Our main contribution is the development of confidence sequences (CS) that estimate the time-varying average score difference between the forecasters. Unlike classical confidence intervals, CSs can be continuously monitored over time while retaining their coverage guarantees. The CSs also do not require any distributional assumptions on the dynamics of the outcomes or on the forecasting models. We additionally develop e-processes and p-processes, which are testing counterparts to CSs that are anytime-valid, i.e., valid at any data-dependent stopping times.
In the second part of the thesis, we consider the problem of evaluating and comparing black-box abstaining classifiers. Abstaining classifiers have the option to withhold predictions on inputs that they are uncertain about, making them increasingly popular in safety-critical applications. We introduce a novel approach and perspective to the evaluation problem by treating the abstentions of a classifier as missing data. Our approach is centered around defining the counterfactual score, which measures the expected performance of the classifier had it not been allowed to abstain. The missing data perspective clarifies the precise identifying conditions for the counterfactual score, requiring independent evaluation data and stochastic abstentions, and paves the way for a nonparametrically efficient and doubly robust estimator for the score. The approach also straightforwardly extends to estimating the difference in two counterfactual scores under distinct abstention mechanisms.
History
Date
2023-06-15Degree Type
- Dissertation
Department
- Statistics and Data Science
Degree Name
- Doctor of Philosophy (PhD)