In Silico Drug Discovery

What It Is and How AI Is Changing It

In silico drug discovery is the use of computational methods — simulation, molecular modeling, machine learning, and statistical analysis — to identify and develop drug candidates before committing to physical experiments. The term comes from silicon, the material at the heart of computer chips, and sits alongside in vivo (in living organisms) and in vitro (in controlled lab conditions) as the third mode of modern experimental science. What has changed recently is not the concept but its standing: in April 2025, the FDA moved to phase out mandatory animal testing for many drug types, formally elevating computational evidence to a primary role in certain regulatory contexts. For pharma and biotech teams, that shift has real consequences for how programs are designed and documented. AQBioSim is SandboxAQ’s platform for applying physics-grounded quantitative AI across the in silico discovery pipeline.

What “in silico” means and where it fits

The progression from in vivo to in vitro to in silico is not a linear replacement — each mode of experimentation answers different questions, and the strongest programs use all three. What in silico methods add is the ability to work at a scale and speed that physical experiments cannot match. A virtual screening run can evaluate billions of compounds against a target in the time it would take a wet-lab program to test a few thousand. A molecular dynamics simulation can capture protein behavior over timescales and under conditions that are difficult or impossible to observe experimentally. A QSAR model can predict the toxicity of a proposed compound before it is ever synthesized.

In silico methods are broader than any single technique. The term covers computational chemistry, molecular modeling, statistical learning, network analysis, pharmacokinetic modeling, and clinical simulation — methods applied across the full drug development lifecycle, from target identification through post-approval lifecycle management. For a detailed look at the core computational methods and how they map to the discovery pipeline, see the companion article on computational drug discovery.

What in silico methods are used for

‍

In early discovery, in silico tools identify and validate drug targets by analyzing genomic, proteomic, and structural data at a scale that manual review cannot. Once a target is selected, virtual screening narrows billions of candidate compounds to a manageable set for experimental testing. Druggability assessment — predicting whether a protein has a binding site that small molecules can usefully access — is also increasingly a computational problem, particularly as structural data from cryo-EM and AI-based protein structure prediction has expanded the number of tractable targets.

Through lead optimization, in silico methods predict how structural modifications will affect potency, selectivity, metabolic stability, and safety liabilities. This is where the feedback loop between computational prediction and experimental measurement is tightest, and where iterative modeling delivers the most direct time savings.

Less commonly discussed but increasingly important is in silico modeling at the clinical stage. Model-Informed Drug Development — MIDD — uses quantitative computational models to inform trial design, dose selection, and safety assessment. Clinical phase 1 trials typically take 32 months; phase 2 about 39 months; phase 3 around 40 months. MIDD approaches that compress dose-finding studies or reduce the number of patient cohorts required have meaningful effects on overall program timelines. Physiologically based pharmacokinetic modeling, population pharmacokinetics, and exposure-response analysis are all established tools in this space, accepted by regulators across therapeutic areas.

The regulatory dimension — why this matters now

For most of the history of computational drug discovery, in silico evidence was supplementary — useful for informing decisions but not sufficient on its own for regulatory submissions. That has been changing steadily, and 2024 and 2025 marked two significant steps forward.

In November 2024, the International Council for Harmonisation endorsed the M15 guideline — “General Principles for Model-Informed Drug Development” — a formal international framework describing how computational evidence should be structured, validated, and submitted to regulatory bodies. The guidance promotes harmonized assessment of MIDD approaches across the FDA, EMA, and other member agencies. It is not a minor procedural update. It establishes computational modeling as a standard component of drug development dossiers, with defined expectations for what constitutes adequate documentation and validation.

In April 2025, the FDA announced its decision to phase out mandatory animal testing for many drug types, explicitly citing the maturation of in silico and other alternative methods as primary evidence for certain pharmacological and safety assessments. The FDA had already accepted in silico data as primary evidence in select cases — virtual bioequivalence studies, model-based dose selection, and others — but the animal testing announcement represents a broader institutional endorsement of computational methods as foundational rather than supplementary.

The practical implication for drug developers is that computational evidence is not just a way to design better experiments — it is increasingly a form of regulatory currency. Programs that generate rigorous, well-documented in silico evidence are better positioned to move faster through regulatory review, reduce the number of required in vivo studies, and support label expansions and post-approval decisions without repeating expensive clinical work.

Where in silico methods are strongest — and where they still need support

The case for in silico methods is strong, and the regulatory momentum behind them is real. That said, the field has also been honest about the remaining gaps — and understanding them is as important as understanding the strengths.

In silico methods deliver their clearest value in speed and scale: screening compound libraries that no physical program could evaluate, flagging liabilities before synthesis, exploring structural modifications faster than the lab can test them. These are genuine advantages that translate into measurable time and cost savings in well-characterized therapeutic areas.

The more difficult cases involve generalization. Most in silico methods — including the AI-driven ones — learn from historical data: measured binding affinities, known protein structures, previously characterized ADMET profiles. When a program moves into novel chemical space or pursues a target with limited prior data, the same models that work reliably on familiar ground become less predictive. Isomorphic Labs, describing the state of the field after AlphaFold 3, put it directly: understanding biomolecular structures alone is not sufficient for real-world drug discovery programs in silico. Accurate prediction requires generalizing to novel, unseen systems — and that remains the frontier.

Physics-based simulation addresses this differently from statistical learning. Rather than extracting patterns from historical data, it computes molecular behavior from first principles: quantum mechanics, thermodynamics, and the laws of chemistry. It can generate reliable predictions in novel chemical spaces without requiring prior experimental records for those systems. SandboxAQ’s Large Quantitative Models combine this physics-grounded foundation with AI to operate at production speed — bringing first-principles accuracy to programs where data-driven in silico methods reach their limits. For more on the specific limitations of statistical machine learning in this context, see the article on machine learning in drug discovery.

What a rigorous in silico program looks like

The FDA’s fit-for-purpose framework for MIDD — formalized in the M15 guideline — offers a useful structure that extends beyond regulatory submissions to in silico programs generally. The core questions are: what is the decision this model is supporting, and what level of evidence is required to support that decision confidently?

Defining the question and context of use before building or selecting models is not bureaucratic overhead — it determines what validation is meaningful. A model used to rank compounds within a known chemical series needs different validation than one used to predict properties in an unexplored chemical space or to support a regulatory submission. Most in silico failures trace back to a mismatch between the model’s actual capabilities and the decision it was asked to support.

Validation quality is the second dimension. Internal benchmarks on curated datasets are a starting point, not a conclusion. The most useful evidence is external validation: performance on compounds and targets that were not part of model development, preferably confirmed against experimental results from independent labs. This is the standard regulatory agencies apply, and it is the right standard for internal decision-making as well.

The integration of experimental feedback is what separates a functional in silico program from a one-time analysis. Models that incorporate experimental results to update predictions over time get more useful as a program progresses. Programs that treat computational and experimental work as parallel rather than iterative miss most of the compounding benefit. AQBioSim’s milestone-based approach is built around this principle — sharing program risk and integrating computational and experimental results as the work advances.

FAQ

What does “in silico” mean?

“In silico” derives from silicon — the material used in computer chips — and describes experimentation or analysis performed computationally rather than in a living organism (in vivo) or a laboratory setting (in vitro). It is the third mode of modern experimental science, used across biology, chemistry, pharmacology, and clinical research.

What is in silico drug discovery?

In silico drug discovery is the application of computational methods — molecular modeling, simulation, machine learning, statistical analysis, and pharmacokinetic modeling — to identify, evaluate, and optimize drug candidates before or alongside physical experiments. It spans the full pipeline from target identification through clinical modeling and post-approval lifecycle management.

How is in silico different from in vitro and in vivo?

In vivo experiments are conducted in living organisms — animals or humans. In vitro experiments are conducted in controlled laboratory conditions, typically in cells or isolated biological systems. In silico experiments are conducted computationally, using models and simulations rather than physical biological systems. The three approaches are complementary: in silico methods inform which in vitro experiments to run; in vitro results inform which candidates advance to in vivo testing; and in silico clinical modeling can reduce the number of in vivo human studies required.

What is Model-Informed Drug Development (MIDD)?

MIDD is the use of quantitative computational models to support drug development decisions and regulatory submissions. Applications include dose selection, clinical trial design, safety assessment, and virtual bioequivalence studies. The FDA has operated a formal MIDD program for years, and the ICH M15 guideline endorsed in November 2024 establishes an international framework for how MIDD evidence should be structured and assessed across regulatory agencies.

Is in silico data accepted by regulators?

Yes, and increasingly so. The FDA has accepted in silico and computational evidence in hundreds of drug applications, using it to support dosing recommendations, trial design, safety assessments, and bioequivalence determinations. In April 2025, the FDA moved to phase out mandatory animal testing for many drug types, further elevating the role of computational evidence. The ICH M15 guideline provides the formal international framework for how this evidence should be prepared and submitted.

What are the limitations of in silico methods?

Most in silico methods rely on historical experimental data and perform best in well-characterized therapeutic areas. Performance degrades in novel chemical spaces, on targets with limited prior data, and for first-in-class molecules with no close structural precedent in the training data. Physics-based simulation — which computes molecular behavior from first principles rather than learning from historical records — addresses this limitation directly. SandboxAQ’s Large Quantitative Models are designed for exactly these harder cases, where data-driven in silico approaches are least reliable.

Explore SandboxAQ’s in silico drug discovery capabilities:

‍