Bandit Algorithms for Recommender Systems

Korkut, Zeynep

doi:10.1184/R1/20181857.v1

Bandit Algorithms for Recommender Systems

thesis

posted on 2022-07-11, 20:43 authored by Zeynep KorkutZeynep Korkut

The multi-armed bandit is by now the de facto model for exploration vs. exploitation problems arising in recommender systems. Yet, there still exists a wide gap between the bandit models studied in theory and the problems faced in the real world. The purpose of this dissertation is to bridge this gap. In Chapter 1, we study the classic stochastic linear bandit problem under the restriction that each arm may be selected for limited number of times. This simple constraint, which we call disposability, captures a common restriction that occurs in recommendation problems from a diverse array of applications ranging from personalized styling services to dating platforms. We show that the regret for this problem is characterized by a previously-unstudied function of the reward distribution among optimal arms. algorithmically, our upper bound relies on an optimism-based policy which, while computationally intractable, lends itself to approximation via a fast alternating heuristic initialized with a classic similarity score. Experiments show that our policy dominates a set of benchmarks which includes algorithms known to be optimal for the linear bandit without disposability, along with natural modifications to these algorithms for the disposable setting.

History

Date

2022-05-15

Degree Type

Dissertation

Department

Tepper School of Business

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Andrew A. Li

Usage metrics

Keywords

Bandit Algorithms Operations Research

Licence

In Copyright

Bandit Algorithms for Recommender Systems

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports