AQCat25 Brings Spin-Aware Foundation Models to Real-World Catalysts

Business
April 30, 2026

SandboxAQ researchers have published AQCat25 in the Nature Portfolio journal, npj Computational Materials, moving foundation models for heterogeneous catalysis from conceptual promise to a practical engine for industrial catalyst discovery. 

AQCat 25 is a high‑fidelity, spin‑aware dataset and model family that enables treatment of magnetically complex, earth‑abundant catalysts at industrial scale. It is built as a complement to the Open Catalyst 2020 (OC20) family. Where OC20 prioritizes breadth and throughput, AQCat25 invests compute where higher electronic fidelity and explicit spin treatment change the answer.

This work will allow R&D teams to deploy AI models which capture the same magnetic physics and complex chemistries their plants depend on, not just idealized, precious-metal systems, so they can explore more materials at higher fidelity with far lower risk of model failure.

AQCat25 Helps Solve the Current Tradeoff between Throughput and Quality in Material Discovery

AQCat25 is designed to help resolve a key structural bottleneck in real-world materials discovery. 

While DFT‑driven workflows remain the scientific gold standard, they can push even elite teams into narrow, low‑throughput studies on simplified surfaces. On the other hand, MLIPs trained on broad but lower‑fidelity datasets often ignore magnetism, sacrifice electronic fidelity, and under‑represent the first‑row transition metals and complex chemistries that drive industrial catalysis.

AQCat25 was designed to bring realistic, magnetically complex, earth‑abundant catalysts into scope for ML foundation potentials, while keeping the broader OC20 catalyst universe in play.

AQCat25 Improves the Success Rate of Finding DFT Global Minima While Meeting Competitive Error Benchmarks

Catalysis often hinges on a few numbers: adsorption energies and reaction barriers in the configurations that matter. AQCat25 therefore evaluates models not only on per‑frame errors but on their ability to recover global minimum adsorption energies across diverse adsorbate–surface pairs.

Using a dense benchmark of 50 relaxations per adsorbate–slab combination, the paper compares:

  • The original OC20‑trained EquiformerV2 model.
  • A directly fine‑tuned variant on AQCat25.
  • A jointly trained AQCat25‑EV2 in+mid‑FiLM model that sees both AQCat25 and OC20 data.

On this task, the jointly trained spin‑aware model:

  • Improves the success rate of finding the DFT global minimum within a tight energy window.
  • Maintains competitive or better mean absolute errors compared with prior state‑of‑the‑art models designed for this benchmark, while operating on a richer, mixed‑fidelity training corpus.

For portfolio‑level decision making, this can translate into higher confidence that the candidates a team promotes actually correspond to physically meaningful minima, rather than artifacts of the potential energy surface or training distribution.

Inside AQCat25: High-Fidelity, Spin-Aware Data at Scale

AQCat25 is built as a complement to the Open Catalyst 2020 (OC20) family. Key design choices include:

  • 13.5M DFT single‑point calculations across ~47,000 adsorbate–slab systems, generated at significantly higher fidelity than prior large‑scale heterogeneous catalysis datasets.
  • Spin polarization enabled for 12 elements (including Fe, Co, Ni and other magnetically important transition metals) so models see the physics that governs many industrially relevant active sites.
  • Expansion of the chemical space with a maximum of three elements per material, introducing six additional alkaline and rare‑earth species like Li, Ba, La, Ce, Mg, and F to better reflect real catalyst formulations.
  • A 500 eV plane‑wave cutoff and RPBE functional as the default stack, aligning with standard practice in heterogeneous catalysis rather than compromising solely for scale.

The dataset deliberately spans:

  • Relaxations from high‑force to converged structures.
  • High‑energy configurations via molecular dynamics, rattled geometries, and transition‑state–like systems.
  • In‑domain and three out‑of‑domain splits (new adsorbates, new materials, and both) to test generalization realistically, instead of over‑optimizing for a single distribution.

The paper also shows why fine‑tuning an existing OC20 model on AQCat25 high‑fidelity data wasn’t the approach path forward for a production-scale environment. Fine-tuning pretrained potentials improved accuracy on the new domain, but “catastrophically forgot” the broader space of nonmagnetic catalysts from OC20.

AQCat25 Can Help Organizations Expand Their Design Space

With AQCat25 and its baseline models, scientific leaders and their teams can begin to:

  • Expand beyond precious‑metal systems: Evaluate earth‑abundant, spin‑polarized catalysts with models that explicitly treat magnetism, instead of assuming non‑magnetic behavior for convenience.
  • Run broader, higher‑risk design campaigns: Use spin‑aware MLIPs to pre‑screen large design spaces — including new bulk materials and adsorbates — before committing DFT or experimental budgets.
  • Reduce silent failure modes: Operate with models that have been stress‑tested on high‑force, high‑energy regimes and out‑of‑domain splits, rather than extrapolating from narrow training sets.

Because AQCat25 is released alongside public models and code, R&D organizations can start their work today.

To Learn More — and Explore Collaborations

To dive into the technical details, you can: Read the full official publication and supplementary information for AQCat25, including dataset design, DFT settings, and model benchmarks.

If you are evaluating how spin‑aware, high‑fidelity MLIPs fit into your catalyst discovery or process‑development roadmap, the SandboxAQ team is actively collaborating with industrial and government partners. Learn more and express interest below. 

Sign Up to Learn More

No items found.