Communication-Efficient Optimization Algorithms for Decentralized Machine Learning
Emerging applications in multi-agent environments such as internet-of-things, networked sensing, large-scale machine learning and federated learning, have attracting increasing attention for decentralized optimization algorithms that are resource efficient in both computation and communication while being able to protect data privacy. This thesis considers the prototypical setting where the agents work collaboratively to minimize the sum of local loss functions by only communicating with their neighbors over a predetermined network topology. We propose four decentralized optimization algorithms, with the intertwined goals of achieving communication efficiency, computation efficiency, as well as data privacy through carefully designed update procedures. For all algorithms, we provide theoretical convergence guarantees and perform extensive numerical experiments to support the analyses. First, we propose a newton-type algorithm called network-dane for decentralized problems with strongly convex objectives, which utilizes gradient tracking and extra mixing (i. E. , multiple mixing rounds per iteration) to extend the celebrated server/client algorithm dane to the decentralized setting. Our analysis shows that, similar to dane, network-dane achieves linear convergence guarantees for general smooth strongly convex and quadratic objective functions, and can provably harness data similarity across agents to accelerate convergence, which highlights its communication efficiency. We further extend network-dane by allowing a nonsmooth penalty term for composite optimization problems, and by using stochastic variance-reduced local updates for computation efficiency.
Next, for more general decentralized nonconvex empirical risk minimization (erm) problems, we propose destress, which matches the optimal incremental first-order oracle (ifo) complexity of centralized algorithms for finding first-order stationary points, while maintaining communication efficiency. In addition to gradient tracking and extra mixing, destress also incorporates randomly activated stochastic recursive mini-batch gradient updates to avoid unnecessary computations, which allows the improvement upon prior decentralized algorithms over a wide range of parameter regimes.
Then, we consider communication compression to further improve communication efficiency for decentralized nonconvex optimization problems, which leads to the development of beer. This algorithm also leverages stochastic gradient tracking, and in addition incorporates communication compression together with error feedback to improve communication quality, which allows it to maintain communication efficiency even when data distribution over agents is highly heterogeneous.
Finally, based on beer, we propose porter for decentralized nonconvex optimization, which can provably converge to global first-order stationary points while preserving each agent's privacy under the notion of differential privacy. Porter utilizes stochastic gradient tracking, communication compression together with error feedback as beer does, and further leverages gaussian perturbation with gradient clipping to preserve privacy for arbitrary objective functions.
In summary, our work emphasizes 1) the effectiveness of gradient tracking in estimating global gradients, 2) by using extra mixing, communication compression and error feedback, the overall efficiency can be substantially improved, and 3) privacy can be preserved thorough gradient clipping and gaussian perturbation. The key algorithm design ideas can also be applied, in a systematic manner, to design new resource-efficient decentralized optimization algorithms.
History
Date
2023-05-03Degree Type
- Dissertation
Department
- Electrical and Computer Engineering
Degree Name
- Doctor of Philosophy (PhD)