

SandboxAQ researchers have published AQCat25 in the Nature Portfolio journal, npj Computational Materials, moving foundation models for heterogeneous catalysis from conceptual promise to a practical engine for industrial catalyst discovery.
AQCat 25 is a high‑fidelity, spin‑aware dataset and model family that enables treatment of magnetically complex, earth‑abundant catalysts at industrial scale. It is built as a complement to the Open Catalyst 2020 (OC20) family. Where OC20 prioritizes breadth and throughput, AQCat25 invests compute where higher electronic fidelity and explicit spin treatment change the answer.
This work will allow R&D teams to deploy AI models which capture the same magnetic physics and complex chemistries their plants depend on, not just idealized, precious-metal systems, so they can explore more materials at higher fidelity with far lower risk of model failure.
AQCat25 is designed to help resolve a key structural bottleneck in real-world materials discovery.
While DFT‑driven workflows remain the scientific gold standard, they can push even elite teams into narrow, low‑throughput studies on simplified surfaces. On the other hand, MLIPs trained on broad but lower‑fidelity datasets often ignore magnetism, sacrifice electronic fidelity, and under‑represent the first‑row transition metals and complex chemistries that drive industrial catalysis.
AQCat25 was designed to bring realistic, magnetically complex, earth‑abundant catalysts into scope for ML foundation potentials, while keeping the broader OC20 catalyst universe in play.
Catalysis often hinges on a few numbers: adsorption energies and reaction barriers in the configurations that matter. AQCat25 therefore evaluates models not only on per‑frame errors but on their ability to recover global minimum adsorption energies across diverse adsorbate–surface pairs.
Using a dense benchmark of 50 relaxations per adsorbate–slab combination, the paper compares:
On this task, the jointly trained spin‑aware model:
For portfolio‑level decision making, this can translate into higher confidence that the candidates a team promotes actually correspond to physically meaningful minima, rather than artifacts of the potential energy surface or training distribution.
AQCat25 is built as a complement to the Open Catalyst 2020 (OC20) family. Key design choices include:
The dataset deliberately spans:
The paper also shows why fine‑tuning an existing OC20 model on AQCat25 high‑fidelity data wasn’t the approach path forward for a production-scale environment. Fine-tuning pretrained potentials improved accuracy on the new domain, but “catastrophically forgot” the broader space of nonmagnetic catalysts from OC20.
With AQCat25 and its baseline models, scientific leaders and their teams can begin to:
Because AQCat25 is released alongside public models and code, R&D organizations can start their work today.
To dive into the technical details, you can: Read the full official publication and supplementary information for AQCat25, including dataset design, DFT settings, and model benchmarks.
If you are evaluating how spin‑aware, high‑fidelity MLIPs fit into your catalyst discovery or process‑development roadmap, the SandboxAQ team is actively collaborating with industrial and government partners. Learn more and express interest below.