CMU-CS-21-136.pdf (2.66 MB)
Towards More Efficient and Data-Driven Domain Adaptation
thesisposted on 2022-02-23, 22:04 authored by Petar StojanovPetar Stojanov
In recent years with the fast progress made in neural networks research, supervised machine learning approaches have become increasingly powerful in finding flexible functions to predict target variable Y from input features X. However, most of these complex models require a large amounts of data to train, and often work under the assumption that the data points are i.i.d. In reality these assumptions are very likely to be violated. A simplified notion of this violation is when the training and the test datasets come from different joint distributions (i.e. P train(X, Y ) 6= P test(X, Y )). In this setting, where the training and test datasets are also known as source and target domains respectively, domain adaptation is required to obtain good performance. In particular, when only unlabeled features are observed in the target domain, this setting is referred to as unsupervised domain adaptation, and it will be the main focus of this thesis. Domain adaptation is a wide sub-field of machine learning with the task of designing algorithms to account for this distributional difference under specific assumptions, for the purpose of better prediction performance in the target domain. In this thesis we make use of the data-generating process to address several subproblems of unsupervised domain adaptation. Namely, we first address the problem of unsupervised domain adaptation with multiple labeled source domains and an unlabeled target domain under the conditional-target shift setting, and we present an approach to capture the low-dimensional changes of the joint distribution across domains in order to perform prediction in the target domain. Secondly, we introduce an algorithm to reduce the dimensionality of the data when performing domain adaptation under the covariate shift setting. In particular, we make use of the particular properties of the covariate shift setting in order to reduce the dimensionality of the data such that we preserve relevant predictive information about the target variable Y . We further investigate domain adaptation from the perspective of the data-generating process when addressing the problem using neural networks. Deep neural architectures are commonly used to extract domain-invariant representations from the observed features. However, without labels in the target domain, there is no guarantee that these representations will have relevant predictive information for the target domain data. In this thesis, we investigate techniques to regularize this invariant representation in order to enforce that it has non-trivial structure which contains information which is relevant for predicting Y in the target domain. The first of these techniques is based on mutual information, and the second technique makes use of a novel criterion of distortion of the marginal distribution when transforming it from the source to the target domain.
- Computer Science
- Doctor of Philosophy (PhD)