Supporting information for Neural Network Embeddings based Similarity Search Method for Catalyst Systems
In this repository, we included code to prepare dataset, train gemnet model, build the faiss index, search the faiss index and visualize the searched results in the notebook `faiss-gemnet-qm9-mp.ipynb`. It reproduced our examples in the manuscript for the QM9 and the Materials Project dataset. For the OC20 dataset, we did not include its related data here because of its large size (> 50 GB), the code to process the OC20 dataset is almost the same as the code included in the notebook for the QM9 dataset.
We include the intermediate data (GemNet checkpoints, lmdb, faiss index and the searched result for the QM9 and the Materials project in the directory `example-data`. We also put the GemNet checkpoint for the OC20 dataset in this directory. The training and evaluation of the Gaussian regression process model using the searched molecules for the query Benzene are demonstrated in the `ben-gp-data` directory, in which the `qm9-gp-gemnet-morgan-random-nrg.ipynb` can be run on Colab.