Carnegie Mellon University
Rose_cmu_0041E_10573.pdf (28.6 MB)

Towards Efficient and Robust Molecular Crystal Structure Prediction using Genarris, GAtor, and Machine Learning

Download (28.6 MB)
posted on 2020-11-20, 21:06 authored by Timothy RoseTimothy Rose
Molecular crystals are a versatile class of materials with applications ranging from pharmaceuticals to organic electronics. Because molecular crystals are bound by weak
dispersion interactions, they often crystallize in more than one solid form, a phenomenon known as polymorphism. Crystal structure prediction (CSP), or the prediction of a
molecule’s putative crystal structures solely from its chemical composition, is a coveted computational tool as it can predict previously unobserved polymorphs - which may
display vastly different physical properties - and serve as complementary tool for experimental investigations. CSP is difficult in part because one needs to sample a large
configuration space for even the simplest molecules. Furthermore, the differences between polymorphs can be even lower than 1 kJ/mol, making reliable CSP an extremely
challenging task. In this thesis, I develop and apply methods that assess and enhance the efficiency and robustness of molecular CSP within our Python packages called Genarris and GAtor and a new package developed for fast machine learning (ML) applications of the Smooth Overlap of Atomic Environments (SOAP) descriptor. I begin by CSP of 3,4-
cyclobutylfuran with Gator, demonstrating robustness of the GAtor workflow by finding both of its experimentally stable Z = 4 and metastable Z = 8 polymorphs. I then validate and
use a non-interacting fragment density or Harris approximation (HA) for accelerated energy screening. Next, I analyze the ability of unsupervised machine learning to
dynamically cluster a population of crystal structures into niches of structural similarity with a cluster-based fitness function and show that niching aided in overcoming initial pool biases, provided a quality balance of exploration and exploitation on the potential energy surface, and facilitated generation of the experimental structure compared with the control runs that did not use clustering. I then proceed to automate and massively parallelize the workflow in Genarris for more than an order of magnitude speedup, including a new parallel algorithm for affinity propagation (AP) clustering. I provide results for the new Genarris 2.0 package through case studies applied to structures generated in special Wyckoff positions. Finally, I describe my automated and parallelized ML package using the
SOAP kernel and show its viability in molecular crystal structure energy ranking for fast screening of thousands of potential structures. Included is a new Informational Sphere
Sampling (ISS) selection method I developed which diversely samples by only sampling structures outside hyperspheres originating on the already-selected structures with autonomously and dynamically determined radii. Its performance is on par with existing methods and has potential for improved sampling over Monte Carlo. Finally, I develop and include a method to calculate the kernel using a vectorized and parallelized approach that is more than an order of magnitude faster than existing code employing SOAP and is more than two orders of magnitude faster than density functional theory.




Degree Type

  • Dissertation


  • Materials Science and Engineering

Degree Name

  • Doctor of Philosophy (PhD)


Noa Marom

Usage metrics


    Ref. manager