Machine Learning in Drug Discovery

how machine learning accelerates drug discovery

Machine learning has moved from a research curiosity to a standard tool in pharmaceutical R&D. By July 2025, more than 29 AI-driven therapeutic programs had advanced to human clinical trials — a meaningful shift from five years prior when almost none had. Yet the technology’s adoption has also exposed a consistent limitation: ML models depend on historical data, and drug discovery’s most important problems are precisely the ones where historical data is thinnest. Understanding what ML does well, and where it breaks down, is the starting point for building a discovery program that actually performs in production. AQBioSim is SandboxAQ’s platform for applying quantitative AI — including physics-based simulation — across the full drug discovery lifecycle.

What machine learning does in drug discovery

ML earns its place in the discovery pipeline by doing things faster than humans can do experimentally, and sometimes doing them better. The clearest wins are in four areas:

Virtual screening and hit identification

Traditional high-throughput screening tests hundreds of thousands of compounds physically — an expensive, slow process. ML-powered virtual screening can rank tens of millions of compounds computationally, directing synthesis and assay resources toward the candidates most likely to show activity. Deep docking, active learning, and multi-task learning have substantially shortened hit identification timelines in documented programs.

Lead optimization

Once a hit is identified, the next challenge is improving it: increasing potency, improving selectivity, reducing toxicity, and maintaining synthesizability, often simultaneously. ML models trained on structure-activity relationships can propose modifications, predict how properties will change, and help teams prioritize which analogs to make next. This compresses iteration cycles that otherwise take weeks into days.

ADME/T prediction

Absorption, distribution, metabolism, excretion, and toxicity (ADME/T) failures are one of the most common reasons drug candidates fail late in development, after the most expensive work has already been done. ML models can flag likely ADME/T liabilities earlier — solubility, clearance, hERG liability, hepatotoxicity — allowing teams to address problems during lead optimization rather than after clinical entry.

De novo molecule design

Generative AI models can propose entirely novel molecular structures with desired properties, rather than selecting from an existing library. This is among the more advanced applications, with significant variation in real-world performance depending on how well the generative model is constrained by practical requirements like synthesizability and stability.

Where machine learning falls short

A 2026 systematic review covering five years of published AI drug discovery programs noted that benchmarking results “often give a higher impression of real-world performance than is actually the case.” That gap between benchmark and production is not a failure of any specific algorithm — it reflects a structural tension between how ML works and what drug discovery requires.

Data sparsity

‍

ML models learn from data. Drug discovery generates far less of it, far more expensively, than the domains where ML has had its clearest successes. A dataset of ten thousand experimentally validated binding affinities for a particular target is considered large in pharma. In machine learning terms, it is small — too small for many modern architectures to train robustly without overfitting. When data is sparse, models memorize the training set rather than learning generalizable rules.

Generalization to novel targets and chemical spaces

Publicly available training datasets are heavily biased toward well-studied compounds and targets. Models trained on these datasets perform well when tested on similar compounds, but their performance degrades significantly on novel targets or chemical spaces that are underrepresented in the literature. The problem is self-reinforcing: the targets most in need of new drugs — rare diseases, novel mechanisms, previously undruggable proteins — are exactly those with the least training data.

The “similar to what we’ve seen before” problem

Pattern matching on historical data is, by definition, a backwards-looking method. For first-in-class molecules — compounds with no close structural analogues in the training data — ML models have no strong signal to work from. Predictions in these regimes carry high uncertainty, but standard ML models often do not quantify that uncertainty reliably, making it difficult for teams to know when to trust the output and when to be skeptical.

What physics-based simulation adds

The limitations above are not arguments against using ML — they are arguments for knowing what it cannot do, and supplementing it with methods that address those specific gaps. Physics-based simulation is the primary complement.

Where ML learns correlations from historical data, physics-based simulation models molecular behavior from first principles: quantum mechanics, thermodynamics, and the laws of chemistry. It generates its own “training data” through simulation rather than requiring historical experimental records. That means it can operate reliably in exactly the regimes where ML struggles — novel targets, sparse data, first-in-class chemical spaces.

SandboxAQ’s Large Quantitative Models (LQMs) combine this physics-grounded foundation with AI to deliver predictions that are both fast enough for production workflows and accurate enough to inform real decisions. A specific example: AQBioSim’s AQFEP (absolute free energy perturbation) solution does not require a reference molecule — the standard requirement for most FEP methods — which means it can operate in chemical spaces where no closely related precedent exists. That capability is where purely statistical ML approaches cannot follow.

The practical result, validated in SandboxAQ’s work with UCSF, is a 30-fold improvement in hit rate and an expansion of the chemical exploration space from 250,000 molecules to 5.6 million — numbers that reflect what happens when physics-based selection replaces statistical ranking in sparse, challenging targets.

What a modern ML drug discovery workflow looks like

In production, the strongest programs are not purely ML or purely simulation — they combine methods at the stages where each performs best:

Target identification: ML models analyze multi-omics data, literature, and biological networks to surface and prioritize targets with therapeutic potential.
Hit identification: Virtual screening — ML-ranked or physics-scored — narrows a large compound library to a tractable set for experimental testing. Active learning can focus experimental resources on the most informative compounds.
Lead optimization: ML structure-activity models propose modifications; physics-based FEP or docking provides high-accuracy scoring for the most promising candidates. The two approaches check each other.
ADME/T and safety: ML models flag likely liabilities early; experimental confirmation concentrates on the flagged risk areas rather than broad screening.
Candidate selection: Simulation-informed ranking of clinical candidates, incorporating the full property profile rather than optimizing a single metric.

The compounding advantage is fewer wasted cycles. Each stage feeds information back into the next, and a program that treats experimental results as training signals — rather than isolated data points — gets better over time rather than plateauing after the first round.

Evaluating ML drug discovery platforms

The most revealing questions are not about features — they are about failure modes. Start with data: how much does the platform need, and what happens when it gets less? A system that performs well on standard benchmarks but degrades on sparse, novel targets is not useful for first-in-class programs, which is often where the clinical need is greatest.

The next question is whether the platform incorporates physics-based methods or relies entirely on statistical models. This distinction matters most when the chemical space is unfamiliar — when there is no close precedent in the training data. A purely statistical platform has no principled way to handle that situation. One grounded in simulation does.

Uncertainty quantification is underrated in vendor conversations. A platform that cannot communicate when its predictions are unreliable is harder to work with than one that is occasionally wrong but honest about it. Teams need to know where the model is confident and where they should rely on experimental data instead.

Finally, ask about external validation: peer-reviewed publications, named partners, and reproducible results. Internal benchmarks on curated datasets are a starting point, not evidence of production performance. AQBioSim’s results with UCSF, Sanofi, and Riboscience are documented in published case studies and peer-reviewed manuscripts.

FAQ

What is machine learning in drug discovery?

Machine learning in drug discovery refers to the application of statistical AI models to accelerate and improve the drug development process. It is used across stages including virtual screening, hit identification, lead optimization, and ADME/T prediction. ML models learn from historical experimental data to make predictions about new compounds, helping teams prioritize which candidates to test and synthesize.

How does machine learning accelerate drug discovery?

By processing and ranking far more compounds computationally than can be tested physically, ML reduces the experimental burden at each stage of the pipeline. It also helps predict molecular properties — toxicity, solubility, binding affinity — before synthesis, allowing teams to eliminate poor candidates earlier and focus resources on the most promising series. By July 2025, more than 29 ML-assisted therapeutic programs had reached human clinical trials.

What are the limitations of machine learning in drug discovery?

ML models depend on historical training data, which is sparse in drug discovery relative to other fields. Performance degrades significantly on novel targets, rare diseases, and chemical spaces that are underrepresented in public databases. Models trained on well-studied compounds tend to overfit and fail to generalize to first-in-class discovery, where the most urgent unmet needs often exist. Uncertainty quantification is also a persistent challenge — standard ML models often do not reliably communicate when their predictions should not be trusted.

What is the difference between machine learning and physics-based simulation in drug discovery?

Machine learning learns patterns from historical experimental data. Physics-based simulation models molecular behavior from first principles — quantum mechanics and thermodynamics — without requiring historical precedent. The practical difference is that simulation-based methods can operate reliably in novel chemical spaces and on targets with limited data, while ML performance degrades in those same conditions. The strongest platforms combine both: ML for speed and pattern recognition across known space, physics-based simulation for accuracy and reliability in unexplored territory.

What is a Large Quantitative Model (LQM)?

Large Quantitative Models are AI systems trained on physics, chemistry, biology, and mathematics to simulate real-world systems — as opposed to large language models, which are trained on text. In drug discovery, LQMs use physics-based simulation to model molecular interactions and predict drug properties with scientific precision. SandboxAQ’s LQM platform powers AQBioSim, enabling predictions in data-sparse regimes where conventional ML approaches are unreliable.

How do I evaluate a machine learning drug discovery platform?

Key questions: Does it require large training datasets, or can it perform with sparse data? Has it been validated on novel targets, not just standard benchmarks? Does it incorporate physics-based methods alongside statistical models? Does it quantify prediction uncertainty? Can experimental results feed back to improve future predictions? And are its performance claims supported by peer-reviewed publications and named case studies?

Explore SandboxAQ’s drug discovery capabilities:

‍