Structure-Based Drug Design:

AI, Simulation, and What Comes Next | SandboxAQ

Structure-based drug design is the practice of using the three-dimensional structure of a protein target to guide the development of molecules that bind to it. Rather than screening large compound libraries empirically and waiting to see what works, SBDD starts from a structural picture of the binding site and reasons from there about what kinds of molecules should fit. The approach has produced some of medicine’s most consequential drugs — HIV protease inhibitors, imatinib, vemurafenib — and generative AI is now expanding what the same rational design logic can do. AQBioSim applies physics-grounded AI across the full SBDD workflow, from generative molecule design through physics-based binding confirmation.

What structure-based drug design means

The core premise is geometric and chemical: every protein target has a binding site with a specific three-dimensional shape, and the chemistry of the amino acid residues lining that site determines what kinds of molecules can form stable, productive interactions within it. A molecule designed to be complementary to that site — matching its shape, making favorable contacts with key residues, and satisfying the physical requirements for stable binding — has a fundamentally better starting point than one identified by chance in a screen.

The lock-and-key analogy captures the basic idea but understates the complexity. Binding is a dynamic process: proteins are flexible, binding sites change conformation in response to different ligands, and water molecules within the site play structural roles that affect which interactions are stable. SBDD works with this complexity rather than ignoring it, using structural data from protein-ligand co-crystal structures to reveal not just where a molecule binds but exactly how — which atoms contact which residues, which interactions are strong, and where there is room to improve.

SBDD enters the drug discovery pipeline after a target has been identified and validated. From there it runs through lead identification, lead optimization, and candidate selection — the stages where structural information has the most direct influence on which molecules advance and which are deprioritized. The essential input is a protein structure: the higher its resolution and the more accurately it represents the biologically relevant conformation, the more useful the structural information becomes.

How protein structures are obtained

Three techniques have defined the structural biology that makes SBDD possible, and each has a different role in modern drug discovery programs.

X-ray crystallography has been the workhorse for decades. It produces atomic-resolution structures of protein-ligand complexes — precisely what is needed to understand how a bound molecule contacts its target and how to modify it. Its limitation is practical: proteins must be crystallized, and many therapeutically relevant targets, particularly membrane proteins and large complexes, resist crystallization. Structures take weeks to months and significant resources to produce.

Cryo-electron microscopy has changed what is structurally accessible. By flash-freezing proteins in near-native conditions and imaging them directly with an electron beam, cryo-EM can determine high-resolution structures of proteins that cannot be crystallized. It has been particularly valuable for membrane proteins — GPCRs, ion channels, transporters — which represent a large fraction of drug targets but were historically inaccessible to crystallography. Resolution has improved rapidly over the past decade to the point where cryo-EM structures can now inform SBDD at the same level of detail as crystallography for many targets.

AlphaFold changed the scale of what is possible in a different way. Where crystallography and cryo-EM provide experimental structures for individual proteins one at a time, AlphaFold predicts protein structures computationally from amino acid sequence. The AlphaFold Protein Structure Database, developed in collaboration between Google DeepMind and EMBL-EBI, now contains over 214 million predicted structures — covering essentially all catalogued proteins known to science, up from approximately 300,000 experimental structures in the Protein Data Bank (Nucleic Acids Research, 2024). Before AlphaFold, SBDD was limited to the fraction of targets with solved experimental structures. That constraint is now largely removed, and the scope of what SBDD can address has expanded accordingly.

The SBDD workflow

Structure-based drug design is not a single computational step. It is a disciplined cycle of structural analysis, computational hypothesis generation, experimental validation, and structural re-analysis — each iteration informing the next.

The first task is binding site identification: determining where on the protein surface a drug molecule can productively bind, and characterizing the geometry and chemistry of that site well enough to guide design. For well-characterized targets with known ligands, binding sites are established. For novel targets, computational tools predict likely druggable pockets from the structure, using features like cavity depth, surface area, and the distribution of hydrophobic and polar residues.

Once a binding site is defined, the program moves to hit identification — finding initial molecules that bind. Two strategies are in common use, often combined. Virtual screening uses the site geometry to computationally rank existing compound libraries by predicted binding, directing experimental resources toward the candidates most likely to show activity (covered in more detail in the companion article on molecular docking). De novo generation uses generative models to design novel molecules conditioned on the pocket structure, exploring chemical space not constrained to existing libraries.

Lead optimization is where SBDD delivers its clearest value. Once an initial hit is confirmed, a co-crystal structure of the protein-ligand complex reveals exactly how the molecule binds — which contacts are made, which are weak, which parts of the pocket are unexplored. Medicinal chemists use this picture to propose modifications: extending a group to fill an unfilled region, replacing an atom to improve a key contact, removing a motif that causes a clash. Computational tools predict the effect of each modification on binding affinity, selectivity, and drug-like properties before any synthesis is done. This is the core efficiency of structure-guided optimization: the structural feedback loop makes each iteration more informed than the last, compressing the time required to move from hit to clinical candidate.

What SBDD has produced — the historical evidence

The track record of structure-based drug design is the strongest argument for its continued relevance. Several of the most clinically important drugs of the past three decades were developed through this approach.

HIV protease inhibitors — saquinavir, ritonavir, indinavir, nelfinavir, and later darunavir — are the canonical example. In the late 1980s and early 1990s, structural biologists determined the crystal structure of HIV-1 protease, the enzyme the virus requires to assemble infectious particles. Drug designers used that structure to develop molecules that fit the active site and block the enzyme. The resulting protease inhibitors, introduced clinically in the mid-1990s as part of combination antiretroviral therapy, transformed HIV/AIDS from a uniformly fatal disease into a manageable chronic condition. Peer-reviewed literature consistently describes the HIV protease inhibitor program as among the most successful examples of rational drug design in the history of medicine.

Imatinib, marketed as Gleevec, targeted the Bcr-Abl kinase driving chronic myeloid leukemia. The drug was initially identified through screening and then refined using structural and chemical knowledge of the kinase’s ATP binding site — the structural understanding of how it locks the enzyme in an inactive conformation guided both optimization and the interpretation of its remarkable clinical results. Its approval in 2001 produced response rates in CML that had not previously been achievable and helped establish targeted, structure-informed kinase inhibitor design as a productive strategy across oncology.

Vemurafenib targets the BRAF V600E mutation in melanoma, exploiting structural knowledge of the mutant kinase’s conformation. Fragment-based drug design — an SBDD approach that starts from small, fragment-sized molecules and builds up — has now produced eight FDA-approved drugs as of 2025, according to the annual fragment-to-lead review in the Journal of Medicinal Chemistry.

How AI is changing SBDD

AlphaFold’s structural coverage is the foundational AI development for SBDD — not because it changed the logic of rational design, but because it removed the structural bottleneck that had limited which targets could be pursued. A program that previously could not begin without years of structural biology work can now start from a predicted structure within days.

Generative AI adds a second layer of change: rather than selecting compounds from an existing library to screen against a pocket, generative models can design novel molecules from scratch, conditioned on the pocket’s three-dimensional geometry. This expands the accessible chemical space from the hundreds of millions of compounds in virtual libraries to effectively unlimited novel structures.

The practical limitation of first-generation generative SBDD models was that they optimized for a single objective — typically predicted binding affinity — without accounting for whether the resulting molecules could actually be synthesized, whether they were drug-like, or whether they had acceptable selectivity profiles. This produced molecules that looked good computationally but were not usable in the lab, creating a translation gap between generative design and practical drug discovery.

IDOLpro, developed by SandboxAQ and published in Chemical Science in 2025, directly addresses this gap. It combines a diffusion-based generative model with multi-objective optimization — simultaneously optimizing binding affinity and synthetic accessibility rather than treating them sequentially. On two standard benchmark datasets, IDOLpro produced molecules with 10–20% better binding affinity than the next best method, while generating more than double the proportion of synthesizable molecules compared to the leading baseline. On a test set of experimental protein-ligand complexes, it was the first method to produce molecules with better binding affinities than the experimentally validated ligands in the dataset. Being over 100 times faster and less expensive than exhaustive virtual screening, it makes de novo structure-based design practical as a routine step rather than a specialized exercise. The published manuscript provides full benchmark results and methodology.

Where physics-based simulation fits

Generative SBDD produces candidates — molecules designed to fit a pocket and satisfy drug-like criteria. Confirming and ranking those candidates with the accuracy that clinical development decisions require is a separate problem, and one that generative models are not designed to solve alone.

Free energy perturbation provides thermodynamic binding affinity calculations at a precision that docking scoring functions and generative model scores approximate. For targets where the structural and chemical environment is well-characterized, FEP-based ranking gives programs the accuracy they need to make confident decisions about which candidates to advance. For novel targets, flexible binding sites, and first-in-class molecules, the physics-based simulation within SandboxAQ’s Large Quantitative Models can operate where statistical methods are least reliable, extending coverage into the chemical and structural spaces that both virtual screening and generative design leave hardest to evaluate.

The practical workflow combines both: IDOLpro generates novel candidates optimized for binding and synthesizability; AQBioSim’s physics-based methods confirm and rank the most promising ones. Each stage does what it is best suited for, and the output of one informs the next.

FAQ

What is structure-based drug design?

Structure-based drug design is the use of three-dimensional protein structures to guide the development of drug candidates. By knowing the shape and chemistry of a protein’s binding site, researchers can rationally design molecules that fit and bind to it, rather than relying on empirical screening. SBDD has produced some of medicine’s most important drugs, including HIV protease inhibitors and imatinib, and is now being combined with generative AI to expand what is possible.

How is structure-based drug design different from ligand-based drug design?

Structure-based drug design works from the three-dimensional structure of the protein target, using it to design or select molecules that fit the binding site. Ligand-based drug design works from the properties of known active compounds, using structural similarity or quantitative structure-activity relationships to identify or design new candidates without requiring a protein structure. SBDD is generally more informative when a resolved or predicted protein structure is available; ligand-based approaches are used when structural information is absent but a set of known actives exists.

What protein structures does structure-based drug design use?

SBDD uses experimentally determined structures from X-ray crystallography and cryo-electron microscopy, as well as computationally predicted structures from tools like AlphaFold. X-ray crystallography provides the highest resolution and is used extensively for protein-ligand co-crystal structures during lead optimization. Cryo-EM has expanded structural access to membrane proteins and large complexes. AlphaFold’s database of over 214 million predicted structures has made structural information available for essentially all catalogued proteins, removing a historical bottleneck for targets that had never been crystallized.

What is fragment-based drug design?

Fragment-based drug design is an SBDD approach that starts with small, low-molecular-weight compounds — fragments — that bind weakly to a target, then uses structural data to grow or link them into larger, potent drug candidates. Because fragments are small, they sample chemical space more efficiently than drug-sized molecules. FBDD has produced eight FDA-approved drugs as of 2025, including vemurafenib, venetoclax, and ribociclib.

How does AI change structure-based drug design?

AI has changed SBDD in two ways. AlphaFold’s protein structure predictions have removed the structural bottleneck that limited which targets SBDD could address, expanding coverage to over 214 million protein sequences. Generative AI models now allow de novo molecule design conditioned on pocket geometry — creating novel candidates beyond what existing compound libraries contain. The key challenge for first-generation generative SBDD models was balancing binding affinity with synthetic accessibility; multi-objective approaches like IDOLpro address this directly.

What is IDOLpro?

IDOLpro is SandboxAQ’s generative AI system for structure-based drug design, published in Chemical Science in 2025. It combines a diffusion-based generative model with multi-objective optimization to simultaneously design molecules with high binding affinity and strong synthetic accessibility — addressing the practical limitation of earlier generative SBDD models that optimized binding alone. On benchmark datasets, it produced molecules with 10–20% better binding affinity than the next best method and more than twice the proportion of synthesizable compounds. It is part of the AQBioSim drug discovery platform.

Explore SandboxAQ’s structure-based drug design capabilities: