Carnegie Mellon University
Browse

Selective inference approaches for augmenting genetic association studies with multi-omics metadata

Download (37.58 MB)
thesis
posted on 2022-05-04, 18:55 authored by Ronald YurkoRonald Yurko

To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new selective inference methodologies could improve power by enabling exploration of test statistics with meta-data for informative weights while retaining desired statistical guarantees. My thesis revolves around this theme by developing statistical and computational tools to address the challenges especially arising from studying complex,

neuropsychiatric disorders. In chapter 2 we explore one such framework, adaptive p-value thresholding (AdaPT), in the context of testing individual single nucleotide polymorphisms (SNPs) for schizophrenia. We demonstrate a substantial increase in power using flexible gradient boosted trees to account for covariates constructed with GWAS statistics from genetically-correlated phenotypes, as well as measures capturing association with gene

expression and coexpression subnetwork membership. In chapter 3, we address a popular approach for computing gene-level p-values that is based on an invalid approximation for the combination of two-sided test statistics. Our correction ensures error rate control and

alleviates null distribution concerns necessary for selective inference procedures. In chapter 4, we introduce an agglomerative algorithm, based on the dependence induced from linkage disquilibrium (LD), to test the aggregation of SNPs into gene-based test statistics for autism spectrum disorder (ASD). The advantages of our approaches are twofold: increased power and increased interpretability, with the latter expediting our understanding of the etiology

of human diseases, disorders, and other phenotypes. Finally, in chapter 5, we demonstrate in simulations an improvement in power in the context of rare variant studies by augmenting testing corrections with annotation information and explore the use of data blurring to explore

annotation structure providing ways to address the challenges of multiplicity persistent in whole genome sequencing.

History

Date

2022-04-19

Degree Type

  • Dissertation

Department

  • Statistics and Data Science

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Kathryn Roeder Max G’Sell

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC