Press Release

Press Coverage

SandboxAQ Unveils SAIR, (Structurally Augmented IC50 Repository), a Novel Open Dataset of Protein-Ligand Structures Labelled By Binding Affinities

Palo Alto, CA; June 18, 2025 – SandboxAQ today announced the launch of SAIR (Structurally Augmented IC50 Repository), the largest-ever detailed dataset of protein-ligand pairs with annotated experimental potency data. Marking a major milestone in computational drug discovery, SAIR provides an unprecedented resource for researchers to advance AI models in drug discovery, significantly enhancing the speed and accuracy of binding affinity predictions.

Leveraging NVIDIA DGX™ Cloud, a development platform for AI training and fine-tuning, and SandboxAQ’s advanced AI Large Quantitative Model (LQM) capabilities, SandboxAQ generated approximately 5.2 million synthetic 3D molecular structures across more than 1 million protein‑ligand systems. Further, with support from NVIDIA, the team achieved a 2x improvement in GPU utilization, increasing throughput across SandboxAQ’s scientific workloads. This collaboration with SandboxAQ provided SandboxAQ the optimized computing infrastructure needed to develop the SAIR dataset. 

SAIR uniquely integrates physics-based modeling with key LQM capabilities, enabling improved generalization, enhanced reliability, and greater applicability for diverse drug discovery tasks. By sharing the SAIR dataset, SandboxAQ demonstrates its unique expertise in quantitative AI for drug discovery and the unparalleled capabilities of its proprietary LQMs.

“By combining our expertise and AI LQM capabilities with NVIDIA’s accelerated computing, we created SAIR to achieve what was previously impossible–accurate, large-scale in silico predictions of protein-ligand binding affinities,” said Nadia Harhen, General Manager of AI Simulation at SandboxAQ. 

“This achievement marks a pivotal moment in drug discovery, demonstrating our capacity to fundamentally transform the traditional trial-and-error process into a rapid, data-driven approach. By putting five-plus million, affinity-labeled protein-ligand structures into the public domain, we’re handing every scientist the raw fuel to train breakthrough models overnight, setting a new pace for drug discovery. SAIR flips scarce experimental data into an opportunity and this release is a glimpse of the range and depth baked into SandboxAQ’s LQM platform,” said Harhen. 

The SAIR dataset offers researchers a comprehensive, high-quality resource to train advanced AI models that can accurately predict protein-ligand binding affinities. By leveraging SAIR, these models can deliver predictions at least 1,000 times faster than traditional physics-based methods. This enables drug developers to accelerate their path from discovery to market. Additional details about the dataset are in the bioRxiv preprint, SAIR (Structurally Augmented IC50 Repository): Enabling Deep Learning for Protein-Ligand Interactions with a Synthetic Structural Dataset.

SandboxAQ’s quantitative AI technology is already delivering superior outcomes through strategic partnerships with leading institutions and pharmaceutical innovators, including UCSF’s Institute of Neurodegenerative Diseases, Riboscience, the Michael J. Fox Foundation, and most recently, Stand Up To Cancer® (SU2C). SandboxAQ’s Large Quantitative Models consistently achieve superior hit rates compared to traditional methods, marking a transformative leap forward in accelerating therapeutic breakthroughs and improving patient outcomes. 

Researchers can access the SAIR dataset today on Google Cloud Platform or at https://www.sandboxaq.com/sair and are encouraged to contact SandboxAQ at  SAIR@sandboxaq.com to put first‑of‑their‑kind models to work on their most challenging targets.

About SandboxAQ
SandboxAQ is a B2B company delivering solutions at the intersection of AI and quantum techniques. The company's Large Quantitative Models (LQMs) deliver critical advances in life sciences, financial services, navigation, and other sectors. The company emerged from Alphabet Inc. as an independent, growth-backed company funded by leading investors and strategic partners including funds and accounts advised by T. Rowe Price Associates, Inc., Alger, IQT, US Innovative Technology Fund, S32, Paladin Capital, BNP Paribas, Eric Schmidt, Breyer Capital, Ray Dalio, Marc Benioff, Thomas Tull, Yann LeCun, and others. For more information, visit http://www.sandboxaq.com.