Carnegie Mellon University
Browse

Asymptotically Optimal Quantile Pure Exploration for Infinite-Armed Bandits

Download (469.14 kB)
conference contribution
posted on 2025-04-25, 16:04 authored by Xiaoyue GongXiaoyue Gong, Mark Sellke

We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability 1 - δ, within ε of being with the top η-fraction of arms. For fixed confidence, we give an algorithm with expected sample complexity (Equation presented)O (log(1/δηε)/ log(12/η)) which matches a known lower bound up to the log(1/η) factor. In particular the δ-dependence is optimal and closes a quadratic gap. For fixed budget, we show the asymptotically optimal sample complexity as δ → 0 is log(1/δ)(log log(1/δ))2/c. The value of c depends explicitly on the problem parameters (including the unknown arm distribution) through a certain Fisher information distance. Even the strictly super-linear dependence on log(1/δ) was not known and resolves a question of [GM20].


History

Publisher Statement

Presented at 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

Date

2023-01-01

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC