Variance Reduction for Stochastic Gradient Optimization

Wang, Chong; Chen, Xi; Smola, Alexander; P Xing, Eric

doi:10.1184/R1/6476438.v1

file.pdf (662.7 kB)

Variance Reduction for Stochastic Gradient Optimization

journal contribution

posted on 2013-12-01, 00:00 authored by Chong Wang, Xi Chen, Alexander SmolaAlexander Smola, Eric P Xing

Stochastic gradient optimization is a class of widely used algorithms for training machine learning models. To optimize an objective, it uses the noisy gradient computed from the random data samples instead of the true gradient computed from the entire dataset. However, when the variance of the noisy gradient is large, the algorithm might spend much time bouncing around, leading to slower convergence and worse performance. In this paper, we develop a general approach of using control variate for variance reduction in stochastic gradient. Data statistics such as low-order moments (pre-computed or estimated online) is used to form the control variate. We demonstrate how to construct the control variate for two practical problems using stochastic gradient optimization. One is convex---the MAP estimation for logistic regression, and the other is non-convex---stochastic variational inference for latent Dirichlet allocation. On both problems, our approach shows faster convergence and better performance than the classical approach.

History

Date

2013-12-01

Usage metrics

Keywords

Machine Learning

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Variance Reduction for Stochastic Gradient Optimization

History

Date

Usage metrics

Categories

Keywords

Licence

Exports