Carnegie Mellon University
Browse

Petuum: A Framework for Iterative-Convergent Distributed ML

Download (734.13 kB)
journal contribution
posted on 2013-12-30, 00:00 authored by Wei Dai, Jinliang Wei, Jin Kyu Kim, Seunghak Lee, Junming Yin, Qirong Ho, Eric P Xing

A major bottleneck to applying advanced ML programs at industrial scales is the migration of an academic implementation, often specialized for a small, well-controlled computer platform such as desktop PCs and small lab-clusters, to a big, less predicable platform such as a corporate cluster or the cloud. This poses enormous challenges: how does one train huge models with billions of parameters on massive data, especially when substantial expertise is required to handle many low-level systems issues? We propose a new architecture of systems components that systematically addresses these challenges, thus providing a general-purpose distributed platform for Big Machine Learning. Our architecture specifically exploits the fact that many ML programs are fundamentally loss function minimization problems, and that their iterative-convergent nature presents many unique opportunities to minimize loss, such as via dynamic variable scheduling and error-bounded consistency models for synchronization. Thus, we treat data, parameter and variable blocks as computing units to be dynamically scheduled and updated in an error-bounded manner, with the goal of minimizing the loss function as quickly as possible.

History

Date

2013-12-30

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC