Learning DAGs with Continuous Optimization

Zheng, Xun

doi:10.1184/R1/13557491.v1

Learning DAGs with Continuous Optimization

thesis

posted on 2021-01-12, 22:11 authored by Xun ZhengXun Zheng

Learning the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) from data is an important and classical problem in machine learning, with prominent applications in causal inference, fairness, interpretability,

and biology, etc. This is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing

approaches often rely on various local heuristics for enforcing the acyclicity constraint. By contrast, structure learning for undirected graphical models (e.g. Gaussian MRF) is recognized as a tractable optimization problem nowadays, and achieved huge success in various practical domains such as bioinformatics. In this thesis, we take a first step towards bridging this gap between directed and undirected graphical models. We begin by introducing a fundamentally different strategy for Bayesian network structure learning: We formulate the problem as a purely continuous optimization program over real matrices that avoids the combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can

be efficiently solved by standard numerical algorithms, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree. We then study the generalization of the above continuous algorithm to learning

nonparametric DAGs. We extend the algebraic characterization of acyclicity to nonparametric structural equation model (SEM) by leveraging nonparametric

sparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases. Lastly, we introduce a unified view of score-based and ICA-based methods based on the proposed continuous optimization framework. In particular, we

show that the popular ICA-based methods that exploits non-Gaussianity of the independent noise distribution can be handled by the continuous optimization framework, which is conceptually clearer and easier to incorporate prior knowledge, and has the potential to be generalized to allow for models with hidden confounders and feedback loops.

History

Date

2020-08-23

Degree Type

Dissertation

Department

Machine Learning

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Eric Xing Pradeep Ravikumar

Usage metrics

Keywords

Bayesian network structure learning causal discovery Knowledge Representation and Machine Learning

Licence

In Copyright

Learning DAGs with Continuous Optimization

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports