Learning and Using Causal Knowledge: A Further Step Towards a Higher-Level Intelligence
Causal knowledge is essential to many tasks in empirical sciences and engineering. Two questions, this thesis tries to answer, are (1) how one can acquire causal knowledge properly and, furthermore, (2) how we should use it. Regarding the former question, a promising research line is to discover causal relationships from observational data under appropriate constraints or assumptions, known as causal discovery. It is more practical than finding causal relations from interventional data, since observational data are much easier to obtain. However, previous approaches in causal discovery usually rely on strong assumptions, which may not hold or be testable in complex environments, and this is one of the most challenging problems in causal discovery. Accordingly, my goal in causal discovery is to develop approaches that can be applied to more general or complex environments with theoretical guarantees. Specifically, I have been concerned with causal discovery in the presence of nonlinearity, mixed data types, heterogeneity, selection bias, and hidden confounders. For the latter question, it has been recognized that correlation-based machine learning (ML) techniques suffer from the lack of interpretability, adaptability, robustness, generalizability, etc., while the causal perspective offers potential solutions to overcome these obstacles. I have been endeavoring to investigate how causal understanding facilitates solving ML problems, including classification, clustering, forecasting in nonstationary environments, transfer learning, reinforcement learning, and representation learning.
This thesis accordingly contains two main parts. Part I is devoted to the development of causal discovery approaches for more general environments. In particular, it will cover (1) generalized score functions for causal discovery, which avoid the multiple-testing issue in constraint-based methods and can handle nonlinear causal mechanisms, a wide class of data distributions, mixed continuous and discrete data, and multidimensional variables, (2) causal discovery from nonstationary/heterogeneous data in a totally nonparametric way, where we find that distribution shifts actually benefit causal discovery, (3) causal discovery from multiple data sets with overlapping variables, which allows different collected variables in different data sets, (4) generalized independent noise condition for estimating latent variable graphs, where we focus on identifying latent variables and their causal relationships for linear non-Gaussian models, and (5) latent hierarchical causal structure discovery with rank deficiency constraints, which allows latent variables following a hierarchical graph structure to generate measured variables, and moreover, that there can be multiple paths between every pair of variables.
Part II is devoted to how causal understanding helps solve ML problems. In particular, it will cover (1) discovering and using causal relationships for classification, (2) time-varying causal relation modeling and nonstationary time series forecasting, (3) specific and shared causal relationship modeling and clustering, which provides both personalized causal information and general ones over the population, and (4) extracting a minimal sufficient set of state representation for reinforcement learning (RL) as well as transfer RL, by leveraging structural constraints and the goal of maximizing the cumulative reward.
- Doctor of Philosophy (PhD)