LatticeZero
Validation

Affinity Benchmark Validation

How LatticeZero evaluates absolute and relative binding predictions across held-out and independent panels.

A transparent, plain-English summary of what we tested, what passed, and exactly which results are production, which are product-candidates, and which are beta — with the caveats stated up front.

01
Structures
02
Scoring workflow
03
Held-out panels
04
Metrics

Validation overview

We validate affinity predictions the way a careful lab would: on held-out and independent panels, with metrics reported in full — including where the model is weaker.

ΔG

Absolute affinity

Predicting a single binding free energy per complex.

Δrank

Relative affinity

Ranking how a change to a molecule shifts binding within a series.

ext

Independent panel

Fresh, externally-sourced public complexes the model was not tuned or validated against.

live

Production status

The live production path is deployed and reproducible in-browser; candidate and beta modes are labeled and are not the default.

Production

The shipped default. Live and reproducible in-browser.

Candidate

Evaluated as evidence. Not the production default; not promoted.

Beta

Under evaluation on broader panels. Separate from the promoted product mode.

Absolute affinity validation

A single validated production scorer is the live default for absolute affinity — the path used for in-app scoring and for full-feature scoring of user-prepared complexes.

Product candidate — held-out blind panel

Candidate

Strong agreement with experiment on a held-out blind panel. Reported as evidence only.

MetricValue
Rank agreement (Spearman ρ)0.9004
Mean absolute error0.85 kcal/mol
Cases off by ≥ 2 kcal/mol2

Status: candidate-only. Not the production default, shown as evidence — not a shipped promise.

Independent 20-complex challenge — outside the loop

Production path

20 public complexes with measured affinities, sourced so they were not part of any prior LatticeZero validation set, scored through the production full-feature path with no model adjustment of any kind.

Independent challenge panelnSpearman ρMAERMSE
Combined panel200.772.112.86
Combined, excluding one documented outlier190.801.742.05
Held-out public-database subset150.75–0.791.6–2.1
Fully-independent external subset50.902.112.35
Caveat — stated plainly

One carbohydrate-rich case exposed a known chemistry limitation and is reported both included and excluded — we do not hide it. On this independent panel the model ranks affinities well (ρ ≈ 0.8) with a mean absolute error around 2 kcal/mol. This is a ranking and sanity result, not a universal calibrated-ΔG guarantee.

Full-feature uploads

Customers can submit their own quality-controlled, prepared complexes and receive a score through the same production full-feature scoring path used in these benchmarks. The workflow is proprietary; what matters publicly is that the production path — not a candidate or beta mode — produces these scores.

Relative affinity validation

Relative affinity asks a different question: within a molecular series, does the model rank the effect of each change correctly?

Promoted product mode

Production

The shipped production default for relative ranking, evaluated on blind lead-optimization targets.

MetricValue
Rank agreement (Spearman ρ)0.8144

Beta broad-holdout mode

Beta

A separate beta mode under evaluation on broader holdout panels. Not the promoted product mode.

PanelSpearman ρ
Lead-optimization holdout — holdout_27 (exp_schrodinger_holdout)0.8091
Broad holdout — global_690.7334
Panel note

holdout_27 is not a PDBbind-relative panel — it is a separate lead-optimization holdout. The beta mode is a distinct evaluation track from the promoted product mode and should not be read as the shipped number.

How to read the metrics

Spearman ρ
Rank agreement, from −1 to 1 — how well predicted order matches the true order. Higher is better; ρ ≈ 0.8 means binders are ordered well. Weigh this most for screening and prioritization.
MAE
Mean absolute error in kcal/mol — the average size of the miss on predicted binding free energy. Lower is better. Context: ~1.4 kcal/mol is roughly a 10× error in binding constant.
RMSE
Like MAE but penalizes large misses more, so it surfaces outliers.
Large-error counts
How many cases miss by ≥ 2 (or ≥ 4) kcal/mol — reported so a single hard case can't hide inside an average.
Candidate / null
A “candidate” result comes from a scorer that is not the production default; “null” / unpromoted means it has not been turned on as shipped behavior. Beta modes are evaluated separately from promoted product modes.

What we do not claim

Downloads

Validation summaries · June 2026 · Production scoring is live and reproducible in-browser on the benchmark pages.