Carnegie Mellon University
Browse

Decomposable Probabilistic Models for Networks in Biology

Download (44.59 MB)
thesis
posted on 2025-05-16, 20:52 authored by Jun Ho YoonJun Ho Yoon

The complexity in a biological system arises from several different underlying biological processes occurring simultaneously. The existing computational methods for disentangling such complexities to obtain a decomposition into individual components from biological data often have limited statistical power, as they rely on a data analysis pipeline where each aspect is examined separately. In this thesis, we consider three types of problems in biology in which multiple aspects of the biological system should be modeled simultaneously to obtain an accurate representation of the underlying dependencies. First, we consider modeling overall dependencies that decompose into dependencies across samples and across features, as commonly observed in expression measurements of genes correlated due to the complex underlying gene regulatory network collected from samples correlated due to cell types, pedigrees, or disease subtypes. We develop a scalable optimization technique for learning a Cartesian product of two graphs and apply it to learn gene regulatory networks in mice related through a pedigree from RNA-seq data. Second, we develop a doubly mixed-effects Gaussian process regression framework for multi-output learning that decomposes the variability in matrix-variate output data into fixed and random effects across samples and outputs. On several spatiotemporal data including COVID-19 epidemiological data, we demonstrate that meaningful decomposition can be learned only if we model all the components simultaneously. Third, we develop a statistical method for learning gene regulatory networks perturbed by cis-acting and trans-acting expression quantitative trait loci (eQTLs) from allele-specific expression and phased genotype data. Our model can decompose the variability in allele-specific expression levels into the component that arises from the cis-acting and trans-acting eQTLs and the component exerted by other genes in the gene regulatory networks. We use our approach to analyze GTEx and LG×SM advanced intercross lines of mice with a known pedigree. Our decomposable probabilistic models and learning algorithms will provide reliable tools for uncovering complex dependencies and their underlying structures from biological data.

Funding

SHAPEIT+Salmon: haplotype phasing and RNA-seq quantification for allele-specific eQTL mapping

National Human Genome Research Institute

Find out more...

Statistical Approach to Uncovering Gene Networks Perturbed by Cis-acting and Trans-acting eQTLswith Active Learning

National Human Genome Research Institute

Find out more...

Computational framework for identifiable and phase-consistent allele-specific expression quantification

Directorate for Biological Sciences

Find out more...

History

Date

2024-04-26

Degree Type

  • Dissertation

Thesis Department

  • Computational Biology

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Seyoung Kim