A Qualitative and Quantitative Analysis of the Bias Caused by Adaptivity in Multi-Armed Bandits

2020-05-21T22:20:17Z (GMT) by Jaehyeok Shin
In classical and non-adaptive data analysis, the target of interest is typically fixed in advance, and a fixed number of samples are collected in an i.i.d. manner to conduct statistical inference on the target. In many cases, however, data are collected and analyzed adaptively. For example, in the multi-armed bandit setting, data are collected sequentially and adaptively such that at every round, a sample is drawn from an arm which is selected based on the sampling history so far. The sampling procedure can be stopped based on a data-driven stopping rule. Furthermore, the adaptively collected data are often used to identify an interesting target and the same data are used to conduct statistical inference on this target. Even though this kind
of adaptive scheme is prevalent in data analysis, theoretical justifications for commonly used inference procedures are not yet sufficiently developed. This disparity challenges the validity of decisions we make based on adaptive data analyses. In order to close the gap, the thesis first focuses on the mean estimation problem for multi-armed bandits. We derive a qualitative characterization of the adaptive mean estimation procedure which determines the sign of bias
of the sample mean for each arm. We provide sharp bounds on both the bias and the risk which show that even though the sample mean is biased under adaptive schemes, the size of the risk is as small as one can achieve under the non-adaptive i.i.d. setting.