Reason: Author Request
until file(s) become available
Theory and Algorithms for Communication- and Computation-Efficient Federated Learning
Machine learning (ML) has achieved great success in the last decade and brought revolutionary changes to our daily lives. However, in order to train a state-of-the-art ML model, the cloud servers need to collect a large amount of training data from its clients. This procedure may expose users’ private and sensitive information. Due to the increasing concerns and more and more strict regulations on data privacy, the current ML training paradigm has become infeasible in many scenarios. To tackle this challenge, federated learning (FL) has been proposed as an alternative learning paradigm, the key idea of which is to move data and computations from cloud to the edge. Specifically, in FL, the model training is performed by clients’ edge devices (such as smartphones, laptops) instead of the dedicated machines on the cloud; the collaboration among clients is achieved by periodically synchronizing the locally trained models instead of centrally aggregating the raw data. Combined with differential privacy or secure multi-party computation techniques, FL can provide strong privacy guarantees. These nice features of FL attract great interest from both academia and industry.
While FL depicts an appealing future, there is still a long way to achieve the ultimate goal. In particular, previous ML training algorithms and theoretical analyses were only designed for centralized training data and reliable computing systems. However, in FL, due to the decentralization of data and computation, we not only lose the control of training data (which is now locally stored on clients) but also need to deal with variabilities in the edge computing system. For example, the communication delay between server and edge computing devices may be significantly longer than that on the cloud; there will also be severe straggler effects, as it is nearly impossible to ensure all devices have the same computing capacity. As a consequence, existing training algorithms can become prohibitively slow and the existing theoretical guarantees will no longer hold. There is a critical need to develop a new set of theories and efficient algorithms that are particularly tailored for decentralized data and unreliable computing systems in FL. This is also the main objective of this dissertation.
In order to address the problem, in the first part of this dissertation, we develop a general analysis framework for several baseline FL training algorithms and show that there is a fundamental trade-off between the communication efficiency and error convergence. While many methods can reduce the communication costs during training, they may end up with a higher optimization error. Inspired by the theoretical insights, we further propose a series of algorithms to achieve a better trade-off, including SlowMo, Overlap-Local-SGD, AdaComm and Matcha. These algorithms have been validated through extensive experiments, can be easily combined together, and more importantly, pioneered the direction of jointly optimizing communication efficiency and algorithms’ error convergence. In the second part of this dissertation, we switch the focus to computation efficiency in FL. In particular, in order to mitigate the severe straggler effects, practitioners usually let clients perform different amounts of work. However, we show that this strategy will lead to an objective inconsistency problem: the algorithm will converge to the minimizer of a different objective function from the original one. To understand this phenomenon, we propose a new analysis framework which allows clients to have heterogeneous local training progress. This new theoretical analysis motivates us to come up with FedNova and its extensions to provably improve existing algorithms. At last, we also study the setting where clients have too limited computing power to complete one local update. These contributions help to lay the foundation and provide principled guidelines for designing practical FL algorithms.
DepartmentElectrical and Computer Engineering
- Doctor of Philosophy (PhD)