Carnegie Mellon University
Browse
Mesner_cmu_0041E_10568.pdf (1.6 MB)

Non-Parametric Causal Discovery for Discrete and Continuous Data

Download (1.6 MB)
thesis
posted on 2020-11-05, 20:32 authored by Octavio MesnerOctavio Mesner
Subject-matter experts typically think of their datasets as causes and effects between many variables, forming a large, complex causal system. Directed acyclic graphs (DAG), also called Bayesian networks, provide a natural way to conceptualize these systems. In contrast, regression modeling can provide strong evidence for the local, causal neighborhood of an outcome within the causal system, but
providing structure for the larger system is challenging with regression. Despite its value as exploratory data analysis or in conjunction with regression models to refine causal understanding, methods for estimating the causal structure underlying a dataset, causal discovery, are rare in fields such as epidemiology, possibly due to the difficulty handling data with continuous and discrete random variables.
This thesis focuses on developing a causal discovery method for researchers whose data typically are comprised of both discrete and continuous variables. Its primary
contribution is the development of an estimator for graph divergence, the Kullback-Leibler divergence between the full, joint distribution and the Bayesian factorization
indicated by a DAG. Graph divergence is a generalization of conditional mutual information: it quantifies the ?t of a DAG to the data, with greater divergence indicating worse ?t and a divergence of zero indicating a perfect characterization of
the conditional independence relationships among the variables. Its nearest neighbor approach gives the estimator the capability to handle mixed data. We show that the estimator is consistent and its convergence separately for the continuous and discrete case under some assumptions.
Last, we demonstrate a way to use graph divergence with a greedy Markov equivalence search algorithm in practice. Though this work is not complete, we estimate causal relationships between personal demographics, sexual risk behaviors, and HIV Pre-exposure prophylaxis among men who have sex with men (MSM) on the American Men's Internet Survey data. This work may be able to inform public
health initiatives and guidelines surrounding sexual health of MSM.

History

Date

2020-08-05

Degree Type

  • Dissertation

Department

  • Engineering and Public Policy

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Cosma Shalizi

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC