Bone_cmu_0041E_10635.pdf (46.52 MB)
Download file

Machine Learning Tools for Smarter Automation and Diagnostics in the Development of Personalized Medicine from Size-Limited Datasets

Download (46.52 MB)
posted on 10.06.2021, 20:59 authored by Jennifer BoneJennifer Bone
With medical datasets becoming more readily available and standardized, machine learning (ML) has revolutionized healthcare through improved analysis of multi-variable
clinical data, discovery of causal relationships or hidden states, and the generalization of predictive models to new and unseen patient data. Current ML architectures such as deep learning and canonical neural networks, rely on large datasets in order to make accurate models. However, variations in patient response due to heterogeneity in populations such as genomic, environmental, and physiological factors and processes suggest the need to tailor medical solutions to the unique features possessed by individual patients. As healthcare becomes more
patient-specific, so too does the need to balance an ever-increasing feature-space (model complexity) with smaller numbers of patients. Thus, inherent in the applications of ML for smarter diagnostics and automation in patient-specific solutions is the drive to leverage biomedical datasets that are rich in information but limited in sample size. This work seeks to adapt ML techniques for feature-importance and predictions from size-limited data in the context of automating 3D-bioprinting for patient-specific implants and transplants, and early diagnosis of renal cancer progression for clinical decision support. Additive manufacturing (AM) of biologically and physiologically active materials
such as hydrogels, cell scaffold proteins, and cells is a promising avenue towards developing patient-specific implants and organ transplants using rapid fabrication and flexible design. However, the “plug-and-play” vision of bio-printed cell scaffolds and organs remains elusive due to the variability of biological materials. The heterogeneity of material response to the same physical process settings results in a complex feature-space that is difficult to optimize. As a result, Hierarchical Machine Learning (HML) is used to embed domain knowledge into a statistical inference framework to reduce the experimental data necessary to model error bias in process design choices. HML-optimized predictors were shown to produce high-fidelity bioprinted constructs that deviate from expected dimensions by less than 10%. Furthermore, the use of a supervised physical middle layer that connects predictors to the quality of print response is shown to aid in transfer learning to new print materials suggesting a method for rapid optimization of parallel 3D bioprinting systems.
Disease diagnosis can also benefit from small experimental or phase 1 clinical data. An innovative Markov model is developed to perform early classification of patient response to hydroxychloroquine/Aldesleukin (IL-2) treatment for progressive renal cancer. The model reduces the high-dimensional (1015 – 1025) feature-space of T-cell receptor (TCR) and B-cell receptor (TCR) systems biology to an intermediate-dimensional space of 400 descriptors,
revealing the causal features responsible for predicting the final state of 30 patients after 15 days of treatment with 95% classification accuracy. Through quantitative monitoring of amino acid motifs in the primary structure of TCRs and BCRS over 3 treatment points, a mechanistic understanding of the orchestration of TCRs and BCRs towards patient recovery is discussed. These results suggest that this Markov model could be a powerful diagnostic tool for leveraging phase 1 clinical data towards early patient diagnosis, informing an early and individualized
medical response.




Degree Type



Biomedical Engineering

Degree Name

  • Doctor of Philosophy (PhD)


Newell Washburn Phil LeDuc