Implementing Data Driven Modeling and Design of Experiments in Green Hydrogen Production Catalyst Discovery and Chemical Engineering Applications
In chemical engineering, scientific research has grown increasingly complex. Machine learning (ML) and data science offer tools to change how we conduct research by digitizing workflows, creating data-driven optimization models, making predictions for future research, and more. Despite recent rapid advances in these two fields, some research setups lack software infrastructure to leverage these tools. Bridging the gap with domain insights for both experimental setups and data science is essential to mutually benefit from their progress.
This dissertation explores the intersection of data science, ML techniques and sequential experimentation guided by Design of Experiments (DoE) principles. The integration of computational techniques with experimental data, we can accelerate trend identification, multi-dimensional system optimization, and high-confidence decision boundary establishment, ultimately expediting experimental discoveries and conserving vital resources. The dissertation consists of six chapters, each contributing to the development of workflows to enhance experimental discovery.
First, we layout the benefits and current limitations in incorporating modern machine learning and data science principles in fundamental scientific discovery. We discuss data sources, manipulation, model selection, and the importance of domain knowledge. This is critical in developing frameworks that account for experimental setup and data collection limitations while ensuring complete datasets that are machine readable for further analysis and modeling. Next, we apply these methods to a high throughput experimental setup measuring light driven H2 production from colloidal metallic heterogeneous catalysts. We take a data science approach to analyzing 96-well plate experiments and performing analytics on each experiment and the entire dataset. Given these findings, we study the in-situ catalyst formation, and optimize the resulting system for H2 production with DoE and subsequent analysis. We then extend these sampling methods towards identifying active multi-metallic catalysts containing Cu-Ru-Fe. The final section introduces novel methods of sequential sampling for different experimental goals - classification tasks. We aim to study different sequential sampling techniques to find divisions between desirable and undesirable regions with a high degree of certainty and few experimental samples.
- Chemical Engineering
- Doctor of Philosophy (PhD)