Decomposable Probabilistic Models for Networks in Biology
The complexity in a biological system arises from several different underlying biological processes occurring simultaneously. The existing computational methods for disentangling such complexities to obtain a decomposition into individual components from biological data often have limited statistical power, as they rely on a data analysis pipeline where each aspect is examined separately. In this thesis, we consider three types of problems in biology in which multiple aspects of the biological system should be modeled simultaneously to obtain an accurate representation of the underlying dependencies. First, we consider modeling overall dependencies that decompose into dependencies across samples and across features, as commonly observed in expression measurements of genes correlated due to the complex underlying gene regulatory network collected from samples correlated due to cell types, pedigrees, or disease subtypes. We develop a scalable optimization technique for learning a Cartesian product of two graphs and apply it to learn gene regulatory networks in mice related through a pedigree from RNA-seq data. Second, we develop a doubly mixed-effects Gaussian process regression framework for multi-output learning that decomposes the variability in matrix-variate output data into fixed and random effects across samples and outputs. On several spatiotemporal data including COVID-19 epidemiological data, we demonstrate that meaningful decomposition can be learned only if we model all the components simultaneously. Third, we develop a statistical method for learning gene regulatory networks perturbed by cis-acting and trans-acting expression quantitative trait loci (eQTLs) from allele-specific expression and phased genotype data. Our model can decompose the variability in allele-specific expression levels into the component that arises from the cis-acting and trans-acting eQTLs and the component exerted by other genes in the gene regulatory networks. We use our approach to analyze GTEx and LG×SM advanced intercross lines of mice with a known pedigree. Our decomposable probabilistic models and learning algorithms will provide reliable tools for uncovering complex dependencies and their underlying structures from biological data.
Funding
SHAPEIT+Salmon: haplotype phasing and RNA-seq quantification for allele-specific eQTL mapping
National Human Genome Research Institute
Find out more...Statistical Approach to Uncovering Gene Networks Perturbed by Cis-acting and Trans-acting eQTLswith Active Learning
National Human Genome Research Institute
Find out more...Computational framework for identifiable and phase-consistent allele-specific expression quantification
Directorate for Biological Sciences
Find out more...History
Date
2024-04-26Degree Type
- Dissertation
Thesis Department
- Computational Biology
Degree Name
- Doctor of Philosophy (PhD)