

Following our AQCat25 dataset of 13.5 million data points on 47,000 spin-polarized catalytic systems, we’ve now trained a model family to predict energies for heterogeneous catalysts. The models, termed AQCat25-EV2, are trained on both Open Catalyst 2020 (OC20) and AQCat25. AQCat25-EV2, in turn, extends the base EquiformerV2 architecture to accurately describe spin-polarized systems. This is achieved without loss of performance for OC20 systems via a training procedure called FiLM, whose implementation for EquiformerV2 we detail in our manuscript. AQCat25-EV2 can, for the first time, confidently handle all industrially-relevant elements for heterogeneous catalyst design.
About 80% of industrial chemical processes, and about the same fraction of manufactured goods, involve a catalyst at some step. More efficient catalysts mean more efficient reactions. A step-change in catalyst efficiency can in this way enable whole new processes: allowing conversion of oil into products that won’t be burned, for example; decarbonizing the atmosphere; or so vastly increasing the efficiency of nitrogen production as to end food insecurity. Yet such improvements are unavailable to industry at present, because the throughput constraints of current laboratory catalyst-screening methods (i.e., averaging two to 30 per week) limits researchers to incremental improvements upon existing catalysts.
Through use of quantum chemistry methods such as Density Functional Theory (DFT), specialist researchers are able to perform some of these experiments virtually, thus increasing throughput by perhaps 10-100X. AI models trained on DFT results promise to increase that throughput by a further 20,000X, which would be enough to make step-change new catalyst designs viable. Such models, including EquiformerV2, are trained upon pioneering datasets such as OC20, which make large amounts of high-quality data publicly available, covering many important catalyst cases. But many of the cases necessary to fully enable machine-learned catalyst design remain uncovered. One of those cases is spin polarization, which is necessary to get accurate performance for all industrially relevant elements. AQCat25 is the only large-scale public dataset that includes spin polarization data for a diverse materials space of heterogeneous catalysts, which sets it apart from other data sources.
We thus set out to train a new model leveraging data from both OC20 and AQCat25. Since models well-trained on the OC20 data already exist (e.g., EquiformerV2), we believed we could build upon this foundation without having to entirely repeat the work. However, we quickly encountered a technical challenge: naively fine-tuning EquiformerV2 on AQCat25 degrades its performance on the existing OC20 data. This is an example of the well-known, and somewhat melodramatically named, machine learning concept of catastrophic forgetting, in which showing new data to a model causes it to “forget” what it already knows (Figure 1). By adapting EquiformerV2 with a strategy called Feature-wise Linear Modulation (FiLM), detailed in our included manuscript, our AQCat25-EV2 model jointly trained on AQCat25 and a 20M datapoints from OC20 is able to perform well on AQCat25 without losing generalizability to the broader OC20 dataset. The inclusion of FiLM assists in differentiating data from AQCat25 and OC20 by conditioning the model with spin and fidelity features. As we and other groups continue to release new data, we expect this strategy to be of increasing importance.

As shown in Figure 2, the overall generalizability for catalysis tasks such as finding the global minimum adsorption energy across a diverse set of surfaces is also improved. AQCat25-EV2 is able to correctly identify the minimum adsorption energy for 69.9% on a diverse set of catalyst surfaces within 0.1 eV of DFT error, compared to 48.2% for the pretrained EV2 model.
.png)
We’re releasing AQCat25-EV2 free for academic use on HuggingFace here. As part of our overall AQCat catalyst modelling solution, we expect this model and future ones to begin unlocking massive improvements in catalyst discovery. If you’re interested in using the model commercially, or in partnering to discover new catalysts with SandboxAQ, please contact us directly.