Carnegie Mellon University
Browse
tfrisby_phd_scs_2023.pdf (27.29 MB)

Sequential Strategies for Automated Science and Protein Engineering

Download (27.29 MB)
thesis
posted on 2023-05-17, 19:33 authored by Trevor FrisbyTrevor Frisby

Many scientific processes depend upon sequential decision making. Choosing which experiments to run next, or how to alter an experimental design, or reconfigure experimental instrumentation affects not just the underlying accuracy or quality of the actual experiment, but also the efficiency at which optimal experimental conditions are identified. Especially as the ability to automate certain experimental components becomes more prevalent, practical algorithms that can guide these types of experimental decision making are more important now than ever. In this dissertation, we use machine learning to address such sequential decision making problems in two emerging biological domains— general laboratory experimentation via a Cloud Lab and protein engineering. Towards the first setting, we introduce protocol, a first-of-its-kind deterministic algorithm that improves experimental protocols via asynchronous, parallel Bayesian optimization. In the latter setting, we describe two methods for selecting protein engineering experiments. First, we show how to formulate Directed Evolution as a regularized Bayesian optimization problem where the regularization term reflects evolutionary or structure-based constraints. Finally, we demonstrate how to use a deep Transformer Protein Language Model to effectively select lead sequences from nanobody repertoires, as well as how to select beneficial single-site mutagenesis experiments that optimize targeted protein functions. 

History

Date

2023-05-14

Degree Type

  • Dissertation

Department

  • Computer Science

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Christopher James Langmead

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC