Combinatorial Multi-armed Bandits in Competitive Environments

Zuo, Jinhang

doi:10.1184/R1/21441456.v1

Combinatorial Multi-armed Bandits in Competitive Environments

thesis

posted on 2022-11-08, 20:39 authored by Jinhang ZuoJinhang Zuo

Multi-armed bandits (MAB) have attracted much attention as a means of capturing the exploration and exploitation tradeoff in sequential decision making. In the classical MAB problem, at each round, a player chooses one arm from a fixed arm set and receives a random reward based on an unknown distribution. Nevertheless, in many real world applications, the problems have a combinatorial nature among multiple arms and possibly non-linear reward functions. Combinatorial multi-armed bandits (CMAB) have been extensively studied for these settings, and most previous works consider CMAB from a single player’s perspective: at each round, one player chooses a set of arms to play, observes the feedback from them and receives a reward. However, motivated by applications such as online advertising (i.e., advertisers put ads on websites to attract user clicks), there might exist multiple players (advertisers) competing over the same set of arms (websites). This competition among players has been less studied and brings significant challenges to the design and analysis of bandit algorithms.

In this thesis, we introduce the competitive CMAB problem from two different perspectives. We first consider competitive CMAB from the follower’s perspective, where a follower and a competitor play with the same set of arms. We assume the follower can choose his action after observing the action of the competitor and study how the follower can maximize his own reward given the competitor’s actions. We then introduce competitive CMAB from the multi-players’ perspective, where multiple players choose combinatorial actions on the same set of arms. Our objective is to design bandit algorithms that maximize the collective reward across all players. We provide general formulations of both settings and design bandit algorithms with theoretical guarantees for real-world applications, including social influence maximization, dynamic channel allocation, and general resource allocation.

History

Date

2022-09-19

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Carlee Joe-Wong

Usage metrics

Keywords

Multi-armed bandits Combinatorial multi-armed bandits combinatorial nature non-linear reward functions Computer Engineering

Licence

CC BY 4.0

Combinatorial Multi-armed Bandits in Competitive Environments

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports