<p>Machine learning classifiers typically provide scores for the different classes. These scores are supplementary to class predictions and may be crucial for downstream decision-making. However, can they be interpreted as probabilities? Scores produced by a calibrated classifier satisfy such a probabilistic property, informally described as follows. For binary classification with labels 0 and 1, a classifier is calibrated if on the instances where it predicts a score s (in [0,1]), the probability of the true label being 1 equals s.</p>
<p>The primary goal of this thesis is to demonstrate that a miscalibrated classifier can be provably “post-hoc” calibrated using a small set of held-out datapoints, such as a validation dataset. Such calibration can be achieved in two different senses: (a) model calibration of a given classifier for a fixed data-generating distribution; and (b) forecast calibration of a sequence of probabilistic forecasts for an online data stream. These two views have been studied by two largely independent bodies of literature; we draw from and contribute to both. In particular, we derive the first calibration method that uses both model and forecast calibration techniques. </p>
<p>The algorithms we develop come with theoretical guarantees that hold under mild or no assumptions. A majority of our work is in the “distribution-free” setting, where we assume that the data is i.i.d., but make no parametric or smoothness assumptions on the data-generating distribution. We show that using discretized or binned scores is necessary and sufficient to achieve distribution-free calibration (Chapters 3–5). The culminating work of this thesis goes beyond distribution-free by altogether dispensing with the requirement that data is being generated from a distribution. We show that even if the data is “adversarial”, calibration can be provably achieved in a practically meaningful manner (Chapters 6 and 7). </p>