Selective inference approaches for augmenting genetic association studies with multi-omics metadata
To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new selective inference methodologies could improve power by enabling exploration of test statistics with meta-data for informative weights while retaining desired statistical guarantees. My thesis revolves around this theme by developing statistical and computational tools to address the challenges especially arising from studying complex,
neuropsychiatric disorders. In chapter 2 we explore one such framework, adaptive p-value thresholding (AdaPT), in the context of testing individual single nucleotide polymorphisms (SNPs) for schizophrenia. We demonstrate a substantial increase in power using flexible gradient boosted trees to account for covariates constructed with GWAS statistics from genetically-correlated phenotypes, as well as measures capturing association with gene
expression and coexpression subnetwork membership. In chapter 3, we address a popular approach for computing gene-level p-values that is based on an invalid approximation for the combination of two-sided test statistics. Our correction ensures error rate control and
alleviates null distribution concerns necessary for selective inference procedures. In chapter 4, we introduce an agglomerative algorithm, based on the dependence induced from linkage disquilibrium (LD), to test the aggregation of SNPs into gene-based test statistics for autism spectrum disorder (ASD). The advantages of our approaches are twofold: increased power and increased interpretability, with the latter expediting our understanding of the etiology
of human diseases, disorders, and other phenotypes. Finally, in chapter 5, we demonstrate in simulations an improvement in power in the context of rare variant studies by augmenting testing corrections with annotation information and explore the use of data blurring to explore
annotation structure providing ways to address the challenges of multiplicity persistent in whole genome sequencing.
History
Date
2022-04-19Degree Type
- Dissertation
Department
- Statistics and Data Science
Degree Name
- Doctor of Philosophy (PhD)