Carnegie Mellon University
Browse

Machine Learning for Molecular Reactivity Predictions

thesis
posted on 2025-05-08, 18:36 authored by Zhen LiuZhen Liu

Computational methods are valuable for designing experiments and understanding experimental results, but they often face a trade-off between accuracy and efficiency. This dissertation developed chemical machine learning software for predicting molecular reactivity.

The thesis first details the development of Auto3D, an ML package for low-energy conformer generation and molecular property predictions. Auto3D includes the ANI and AIMNet families of machine learning interatomic potentials (MLIPs), which can reliably compute molecular energy at quantum mechanical (QM) level fidelity using just a fraction of the time required for QM methods. With these MLIPs, Auto3D can accurately and efficiently search for low-energy conformers and tautomers, perform single-point energy calculations, optimize geometries, and compute thermodynamic properties.

The power of Auto3D was demonstrated in applications for predicting reaction feasibility. We took both top-down and bottom-up approaches to reaction feasibility predictions. For the top-down approach, the goal was to directly predict amide cou pling reaction yield based on the reaction information, such as reactants, products, and conditions. Auto3D was applied to obtain conformers for the molecules and derive reaction features. Using a stacking technique on these reaction descriptors, the yield prediction accuracy achieved an R2 of 0.457 ± 0.006. This was the best yield prediction accuracy on a large literature dataset of amide coupling reactions. Error analysis revealed that balancing model sensitivity and robustness is another significant challenge in training accurate yield prediction models, in addition to the commonly held belief regarding the lack of low-yield reactions and yield report bias.

For the bottom-up approach, we first predicted molecular properties and then fore casted their reaction activity. Specifically, Auto3D and AIMNet2 were used to develop a workflow to automatically compute the ring strain energy (RSE) for cyclic molecules. This workflow is consistent with the corresponding DFT method (ωB97M/Def2 TZVPP) with an MAE of 0.90 kcal/mol. The computed RSE demonstrated a strong correlation with reaction outcomes in copper-free click reactions and ring-opening metathesis polymerization reactions.

Overall, this dissertation demonstrates the potential of machine learning methods, particularly MLIPs, in the development of computational methods and in computer assisted experiment design.

History

Date

2024-09-01

Degree Type

  • Dissertation

Department

  • Chemistry

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Olexandr Isayev

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC