Carnegie Mellon University
Browse

Towards Efficient and Scalable Representation Learning

Download (16.87 MB)
thesis
posted on 2023-08-25, 18:50 authored by Hai Thanh PhamHai Thanh Pham

Nowadays it becomes more and more challenging to tackle the quickly growing amounts of data to extract useful information for making informed decisions. Even with the recent advancements in deep learning, however, the question of how to make use of such enormous data for a diverse set of tasks in an efficient and scalable manner has yet to be resolved. 

To undertake the two main aspects of representation learning from data, namely efficiency and scalability, this thesis presents techniques to deal with diverse tasks including sentiment analysis, handwriting recognition and document intelligence where data appear in different forms: multimodal data that includes text, audio, and videos, noisy scanned handwriting images, or long documents with differing layouts. Due to the availability and potential issues of their data and the distinct objectives of the associated tasks, there is no one-size-fits-all solution but a specific approach to each problem. In addition, in dealing with large-scale data, this thesis also presents some approximation techniques and analysis to estimate the essential components, learn effective representation and speed up the learning process, including matrix trace approximation with a parallel non-adaptive method, spectrum approximation in Gaussian Processes training, and task-based mixture-of-experts models for large-scale multitask neural machine translation models. Throughout those works, this thesis introduces novel approaches for tackling issues that are presented in the data and the tasks, learning efficient representation, and approximating models for practical scalability in the real world. 

History

Date

2023-05-08

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Barnabas Poczos, David P. Woodruff