Adaptive Optimization Methods for Machine Learning

Nguyen, Han

doi:10.1184/R1/21391623.v1

hann1_phd_math_2021.pdf (3.4 MB)

Adaptive Optimization Methods for Machine Learning

thesis

posted on 2022-10-31, 19:02 authored by Han NguyenHan Nguyen

This thesis studies the adaptive optimization algorithms to solve convex and non-convex optimization problems in machine learning. Specifically, we focus on the utilization of adaptive step sizes and adaptive sampling in gradient-based optimization methods. In the first part of this thesis, we analyze the algorithmic stability of the well-known adaptive optimization methods such as Adagrad, RMsprop,. . . , which utilize the adaptivity of the step sizes to speed up convergence. Algorithmic stability measures the sensitivity of a learning algorithm with the change of data, which can be used as an upper bound for the learning algorithm’s generalization. Previously, there was much research on the algorithmic stability of Stochastic Gradient Descent and Stochastic Gradient Descent with Momentum. However, there is no similar result for adaptive optimization methods. This work is an attempt to bridge this gap. We show that under some mild conditions on hyper-parameters, the adaptive optimization methods are uniformly stable. This work also provides some insights on choosing hyper-parameters for those methods that are important in practice. In the second part of this thesis, we design a novel adaptive sampling Stochastic Variance Reduced Gradient method to solve convex and non-convex optimization problems in the distributed setting. Our new method reduces the dependence on the Lipschitz constants, which improves the convergence rate of the Stochastic Variance Reduced method. One novel aspect of our method compared to the previous state of the art is that our method does not need any prior knowledge about Lipschitz constants of local loss functions.

History

Date

2021-11-19

Degree Type

Dissertation

Department

Mathematical Sciences

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Barnabas Poczos Shlomo Ta'assan

Usage metrics

Keywords

Stochastic Gradient Descent Adam

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Adaptive Optimization Methods for Machine Learning

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports