Carnegie Mellon University
Browse

Collaborative learning by leveraging siloed data

Download (3.35 MB)
thesis
posted on 2023-10-20, 18:35 authored by Sebastian Caldas RiveraSebastian Caldas Rivera

Regulations can often limit stakeholders’ modeling capabilities by preventing data sharing. For example, in order to protect patient privacy, clinical centers may be unable to share their data and thus lack representative records to learn about a rare condition. To address this challenge, previous work in machine learning has shown that these stakeholders benefit from training models in a collaborative fashion, improving their predictive performance. However, as we start training these collaborative models in real-world settings, and in order to be truly useful, they need to provide utility along dimensions beyond predictive performance. In this thesis, we propose methods and algorithms to improve collaborative models that leverage siloed data along three dimensions. In the first part, we propose methods to reduce the communication footprint of models learned by mobile devices cooperating over edge networks, allowing for higher capacity models to be trained. Then, in the second part, we introduce an algorithm that provides explanations about predictions of models trained across clinical centers, thus improving their clinical utility. Finally, in the third part, we address the need to encode expert supervision into collaborative models trained using on-device data, increasing the class of problems we can tackle in these scenarios 

History

Date

2023-08-08

Degree Type

  • Dissertation

Department

  • Machine Learning

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Artur Dubrawski