Carnegie Mellon University
Browse

High-probability and Large Deviations Techniques for Design and Analysis of Large-scale and Distributed Learning Systems

Download (2.68 MB)
thesis
posted on 2025-09-17, 20:54 authored by Aleksandar ArmackiAleksandar Armacki
<p dir="ltr">Whether it is Google’s federated framework for mobile devices or OpenAI’s large language models that are capturing the mainstream attention, large-scale learning systems are ubiquitous. While the rate of progress and performance of modern learning systems has been impressive, they are still hampered by many issues. For example, training large-scale learning models is costly both time and resource-wise, making guarantees with respect to an individual sample path imperative. In addition, it was noted that many learning models are impeded by the lack of classical smoothness and induce phenomena such as heavy?tailed gradient noise, necessitating the use of (stochastic) gradient methods with nonlinear mappings, such as sign, clipping or normalization. Moreover, the use of nonlinearly modified gradient methods is known to bring many benefits, such as stabilizing and accelerating training, reducing the size of transmitted messages, as well as enhancing security and privacy in distributed machine learning. Another issue stems from the fact that the data is generated across varied sources, making it difficult to train a single model that caters to the needs of a wide range of users. Toward resolving these issues, the first part of this thesis establishes learning guarantees for a general framework of nonlinear stochastic gradient methods in the presence of heavy-tailed noise. The general framework allows us to subsume many popular nonlinearities, like sign, normalization, clipping and quantization, providing a broad range of guarantees, including large deviation upper bounds and finite-time convergence, both in expectation and high-probability sense. The second part of the thesis is dedicated to studying the multi-model framework in distributed heterogeneous settings and designing algorithms that are able to utilize the wealth of data, while providing communication-efficient models, personalized to individual users.</p>

History

Date

2025-08-04

Degree Type

  • Dissertation

Thesis Department

  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Soummya Kar

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC