Bandit Methods and Selective Prediction in Deep Learning

Ye, Weicheng

doi:10.1184/R1/12515366.v1

weicheny Thesis.pdf (7.04 MB)

Bandit Methods and Selective Prediction in Deep Learning

thesis

posted on 2020-06-29, 21:56 authored by Weicheng YeWeicheng Ye

his thesis consists of three parts involving two areas in the field of machine learning: deep learning and reinforcement learning. Stochastic Lipschitz bandit algorithms are methods that govern exploration-exploitation trade-offs and have been used for a variety of important task domains, including zeroth-order optimization. In the first part, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and arm-space. Due to this flexibility, the algorithm can efficiently optimize rewards and minimize regret, by focusing on the most relevant portions

of the space. Tuning hyperparameters is a crucial problem in deep learning that is required in every single model. Although the deep neural network with high complexity can fit the large-scale data very well, it uses plenty of hyperparameters. Our algorithms can achieve stateof-

the-art performance in challenging real-world tasks such as neural network hyperparameter tuning. Model-free Reinforcement Learning algorithms, such as Q-learning, require less space and are more expressive to use compared to model-based approaches. Upper Confidence Bound (UCB) exploration policy improves the exploration bonuses in the Q-learning framework. In the second part of this work, we study the regret bound of the Q-learning algorithm with UCB exploration in the scenario of compact state-action metric space. We develop a Q-learning

algorithm with UCB exploration policy that adaptively discretizes the continuous state-action space and iteratively updates Q-values. In addition, the algorithm can efficiently optimize rewards and minimize the cumulative regret. We provide a rigorous analysis of bounding the regret with concentration of measure analysis and combinatorial approaches. This is the first result of this kind to the best of our knowledge. Data gathered from real-world applications often suffer from corruption. The low-quality data will hinder the performance of the learning system in terms of the classification accuracy, model building time, and interpretability of the classifier. Selective prediction, also known as prediction with a reject option, reduces the error rate by abstaining from prediction under uncertainty while keeping coverage as high as possible. In the third part of this work, we propose sophisticated threshold learning algorithms integrated with selective prediction that

can estimate the intrinsic rejection rate of the dataset. Correspondingly, we provide a rigorous framework to generalize the estimation of data corruption rate. To leverage the advantage of multiple learning methods, we extend learning algorithms to a hierarchical two-stage system. Our methods have advantages of being noise-robust, intelligible, and flexible with any network

architecture. The empirical results show that our algorithms can accomplish state-of-the-art performance in real-world datasets in many classification and regression problems.

History

Date

2020-05-17

Degree Type

Dissertation

Department

Mathematical Sciences

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Shlomo Ta’asan

Usage metrics

Keywords

bandit methods selective prediction deep learning

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Bandit Methods and Selective Prediction in Deep Learning

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports