Validation

Affinity Benchmark Validation

How LatticeZero evaluates absolute and relative binding predictions across held-out and independent panels.

A transparent, plain-English summary of what we tested, what passed, and exactly which results are production, which are product-candidates, and which are beta — with the caveats stated up front.

Download Absolute Validation PDF Download Relative Validation PDF

Structures

→

Scoring workflow

→

Held-out panels

→

Metrics

Validation overview

We validate affinity predictions the way a careful lab would: on held-out and independent panels, with metrics reported in full — including where the model is weaker.

ΔG

Absolute affinity

Predicting a single binding free energy per complex.

Δrank

Relative affinity

Ranking how a change to a molecule shifts binding within a series.

ext

Independent panel

Fresh, externally-sourced public complexes the model was not tuned or validated against.

live

Production status

The live production path is deployed and reproducible in-browser; candidate and beta modes are labeled and are not the default.

Production

The shipped default. Live and reproducible in-browser.

Candidate

Evaluated as evidence. Not the production default; not promoted.

Beta

Under evaluation on broader panels. Separate from the promoted product mode.

Absolute affinity validation

A single validated production scorer is the live default for absolute affinity — the path used for in-app scoring and for full-feature scoring of user-prepared complexes.

Product candidate — held-out blind panel

Candidate

Strong agreement with experiment on a held-out blind panel. Reported as evidence only.

Metric	Value
Rank agreement (Spearman ρ)	0.9004
Mean absolute error	0.85 kcal/mol
Cases off by ≥ 2 kcal/mol	2

Status: candidate-only. Not the production default, shown as evidence — not a shipped promise.

Independent 20-complex challenge — outside the loop

Production path

20 public complexes with measured affinities, sourced so they were not part of any prior LatticeZero validation set, scored through the production full-feature path with no model adjustment of any kind.

Independent challenge panel	n	Spearman ρ	MAE	RMSE
Combined panel	20	0.77	2.11	2.86
Combined, excluding one documented outlier	19	0.80	1.74	2.05
Held-out public-database subset	15	0.75–0.79	1.6–2.1	—
Fully-independent external subset	5	0.90	2.11	2.35

Caveat — stated plainly

One carbohydrate-rich case exposed a known chemistry limitation and is reported both included and excluded — we do not hide it. On this independent panel the model ranks affinities well (ρ ≈ 0.8) with a mean absolute error around 2 kcal/mol. This is a ranking and sanity result, not a universal calibrated-ΔG guarantee.

Full-feature uploads

Customers can submit their own quality-controlled, prepared complexes and receive a score through the same production full-feature scoring path used in these benchmarks. The workflow is proprietary; what matters publicly is that the production path — not a candidate or beta mode — produces these scores.

Relative affinity validation

Relative affinity asks a different question: within a molecular series, does the model rank the effect of each change correctly?

Promoted product mode

Production

The shipped production default for relative ranking, evaluated on blind lead-optimization targets.

Metric	Value
Rank agreement (Spearman ρ)	0.8144

Beta broad-holdout mode

Beta

A separate beta mode under evaluation on broader holdout panels. Not the promoted product mode.

Panel	Spearman ρ
Lead-optimization holdout — holdout_27 (exp_schrodinger_holdout)	0.8091
Broad holdout — global_69	0.7334

Panel note

holdout_27 is not a PDBbind-relative panel — it is a separate lead-optimization holdout. The beta mode is a distinct evaluation track from the promoted product mode and should not be read as the shipped number.

How to read the metrics

Spearman ρ

Rank agreement, from −1 to 1 — how well predicted order matches the true order. Higher is better; ρ ≈ 0.8 means binders are ordered well. Weigh this most for screening and prioritization.

MAE

Mean absolute error in kcal/mol — the average size of the miss on predicted binding free energy. Lower is better. Context: ~1.4 kcal/mol is roughly a 10× error in binding constant.

RMSE

Like MAE but penalizes large misses more, so it surfaces outliers.

Large-error counts

How many cases miss by ≥ 2 (or ≥ 4) kcal/mol — reported so a single hard case can't hide inside an average.

Candidate / null

A “candidate” result comes from a scorer that is not the production default; “null” / unpromoted means it has not been turned on as shipped behavior. Beta modes are evaluated separately from promoted product modes.

What we do not claim

The product candidate scorer is not the production default and is not promoted.
The independent 20-complex panel is ranking / sanity evidence, not a universal calibrated-ΔG guarantee; out-of-panel mean error is around 2 kcal/mol and one carbohydrate-rich case is a known limitation.
The relative holdout_27 panel is not a PDBbind-relative panel.
Beta modes are not the same as promoted product modes and should not be quoted as shipped numbers.

Downloads

Absolute Affinity Validation Summary

Held-out blind panel + independent 20-complex challenge.

PDF

Relative Affinity Validation Summary

Promoted product mode + beta broad-holdout mode.

PDF

Validation summaries · June 2026 · Production scoring is live and reproducible in-browser on the benchmark pages.