Improving Deep Generative Modeling with Practical Applications

Dai, Zihang

doi:10.1184/R1/28824344.v1

Improving Deep Generative Modeling with Practical Applications

thesis

posted on 2025-04-24, 20:55 authored by Zihang DaiZihang Dai

At the core of unsupervised learning, probabilistic generative models provide a systematic framework to understanding real world data from various domains in a probabilistic manner. Among many possible desiderata of generative models, density estimation, data generation, and representation learning are widely regarded as the three most wanted properties, whose advancement not only bears important theoretical values but can also lead to a breakthrough for practical applications. In recent years, with the rapid development of deep neural networks and computational hardware, the field of deep generative models has witnessed dramatic advancements in all three aspects, significantly outperforming traditional generative models.

Despite the success, existing neural architectures and training objectives are still subject to certain fundamental drawbacks. With these challenges in mind, this thesis focuses on developing novel neural architectures and training objectives that are highly expressive, allow for efficient optimization, and can scale to a large amount of data for generative modeling.

Notably, to better exploit the optimization advantage of Transformer to capture long-term dependency, we propose Transformer-XL, which integrates segment-level recurrence into self-attention without disrupting the temporal coherence. Further, to combine the benefits of autoregressive and denoising auto-encoding based language pretraining, we propose XLNet, which relies on a permutation language modeling objective to maximize the expected log-likelihood of a sequence w.r.t. all possible permutations of the factorization order and hence capture bidirectional context. By further integrating ideas from Transformer-XL, XLNet consistently outperforms previ ous best language pretraining method under the same training condition, and achieves the state-of-the-art performance when scaled up. In addition, to further exploit the effectiveness of language pretraining, we propose a more efficient self-attention architecture Funnel-Transformer, which compresses the hidden state sequence to a shorter length and hence reduces the computation cost. With sequence compression, Funnel-Transformer allows one to trade the sequential resolution of the hidden state sequence for a deeper or wider model, leading to substantial gains under the same amount of computation as measured by the FLOPs.

History

Date

2020-08-20

Degree Type

Dissertation

Department

Language Technologies Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Yiming Yang

Usage metrics

Keywords

deep learning deep generative models representation learning natural language processing machine learning

Licence

CC BY 4.0

Improving Deep Generative Modeling with Practical Applications

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports