Carnegie Mellon University
Browse

Advancing Bandit Optimization for Generalized Structures: Algorithms and Theory

thesis
posted on 2025-09-17, 16:52 authored by Yusha Liu
<p dir="ltr">Sequential decision-making under uncertainty is an important part of machine learning, where optimal algorithms are able to make intelligent decisions based on data acquired in a sequential feedback loop. Optimization with bandit feedback, where only noisy feedback corresponding to the decision choice made (rather than all possible choices) is available, is a pivotal component in the sequential decision-making under uncertainty paradigm. Bandit optimization problems motivate design of algorithms that can handle partial feedback, sequential data that are not independent and identically distributed (i.i.d.), and achieve good performances on-the-fly. This framework has shown significant impact in interactive applications including hyperparameter tuning, recommendation systems, adap- tive control trials, and other scientific disciplines. However, difficulty of analyses in bandits often limits the broadness of problem settings for which theoretical understanding and algorithms are developed, especially compared to the variety of problem structures present across these domains. This thesis identifies and addresses open problems in bandit optimization when leveraging general problem structures to design optimal algorithms. This thesis presents results that broaden the bandit framework beyond traditional assumptions by developing novel theoretical guarantees and algorithms that work in more generalized settings, expanding the applicability of bandit algorithms and relevance of theoretical results. </p><p dir="ltr">The first part of this thesis focuses on addressing inefficiencies of existing algorithms and limitations in the problem assumptions. Firstly, we present results that unify bandit optimization across reward functions with different levels of smoothness by filling the gap between the non-parametric Lipschitz continuous functions and the parametric linear functions. We present a novel algorithmic framework with optimal statistical guarantees for functions with the general notion of Hölder smoothness which encapsulates both Lipschitz and parametric functions as special cases. Secondly, we investigate the sample complexity without the knowledge of the reward function smoothness. We develop matching upper and lower bounds on sample complexity of algorithms that are adaptive to the smoothness of the reward function, and conclude with a comprehensive overview across multiple function spaces including reproducing kernel Hilbert spaces and Hölder spaces. In the second part of this thesis, we take inspiration from classical results on M -estimators based on influence (score) functions and show that the same idea can be used to derive a unifying analytical framework that naturally fits M -estimators with optimism-in-face-of-uncertainty type of bandit algorithms in a "plug-and-play" fashion. </p><p dir="ltr">Together, this thesis advances optimization with bandit feedback by enabling novel algorithms and statistical guarantees pertaining to richer and more general landscapes.</p>

History

Date

2025-08-01

Degree Type

  • Dissertation

Thesis Department

  • Machine Learning

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Aarti Singh