<p dir="ltr">My thesis focuses on generative models and their applications to discrete data. We propose novel algorithms that integrate insights from state-of-the-art generative models and domain- specific knowledge of discrete data types. These algorithms aim to enhance property similarity to training data, improve data validity, and elevate the overall quality of generated outputs. The first part of my thesis investigates converting geometric images into a discrete representation using context-free grammar. We discuss effective and scalable techniques to identify suitable representations in a large search space. The second part of my thesis examines the behavior of Variational Autoencoders (VAEs) in recovering high-dimensional data embedded in lower- dimensional manifolds, assessing their ability to recover the manifold and the data density over it. Extending our exploration of VAEs into discrete data domains, particularly in molecular data generation, we found that a method enhancing VAEs' manifold recovery for continuous data also significantly improves discrete data generation. We study its benefits and limitations using the ChEMBL dataset and two smaller datasets of active molecules for protein targets. Lastly, addressing the challenge of generating stable 3D molecules, the thesis incorporates a non-differentiable chemistry oracle, GFN2-xTB, into the denoising process to improve geometry and stability. This approach is validated on datasets like QM9 and GEOM, demonstrating higher stability rates among generated molecules.</p>
Funding
INTELLIGENT MODEL-BASED ADAPTATION FOR MOBILE ROBOTICS