Learning with Staleness

Dai, Wei

doi:10.1184/R1/6720416.v1

Learning with Staleness.pdf (4.45 MB)

Learning with Staleness

thesis

posted on 2018-03-01, 00:00 authored by Wei DaiWei Dai

A fundamental assumption behind most machine learning (ML) algorithms and analyses is the sequential execution. That is, any update to the ML model can be immediately applied and the new model is always available for the next algorithmic step. This basic assumption, however, can be costly to realize, when the computation is carried out across multiple machines, linked by commodity networks that are usually 104 times slower than the memory speed due to fundamental hardware limitations. As a result, concurrent ML computation in the distributed settings often needs to handle delayed updates and perform learning in the presence of staleness. This thesis characterizes learning with staleness from three directions: (1) We extend the theoretical analyses of a number of classical ML algorithms, including stochastic gradient descent, proximal gradient descent on non-convex problems, and Frank-Wolfe algorithms, to explicitly incorporate staleness into their convergence characterizations. (2)We conduct simulation and large-scale distributed experiments to study the empirical effects of staleness on ML algorithms under indeterministic executions. Our results reveal that staleness is a key parameter governing the convergence speed for all considered ML algorithms, with varied ramifications. (3) We design staleness-minimizing parameter server systems by optimizing synchronization methods to effectively reduce the runtime staleness. The proposed optimization of a bounded consistency model utilizes the additional network bandwidths to communicate updates eagerly, relieving users of the burden to tune the staleness level. By minimizing staleness at the framework level, our system stabilizes diverging optimization paths and substantially accelerates convergence across ML algorithms without any modification to the ML programs.

History

Date

2018-03-01

Degree Type

Dissertation

Department

Machine Learning

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Eric P. Xing

Usage metrics

Keywords

Large Scale Machine Learning Distributed Optimization Method Distributed System Parameter Server

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Learning with Staleness

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports