Every score on Benchlist is a fresh re-run of a public benchmark, signed, with confidence intervals shown inline and contamination flagged honestly. Anyone pays fifty cents to challenge any number, the result lands as an independent receipt. Vendor blog posts shouldn't be the source of truth for which model to ship.
No leaderboard service lets you do this. Pick any signed run, queue an independent re-run on a fresh canonical sample, get a second Ed25519 receipt from a different attestor in under five minutes. Disagreements are public. The protocol does the trust work, we don't ask for it.
Try a replay →/api/runs.json…GSM8K is saturated, MMLU is everywhere in training corpora, HumanEval predates most modern training cutoffs. Every leaderboard row carries a contamination tier. Read why GSM8K headlines are noise →
“Self-reported numbers are a race to the bottom. Pick a favorable subset, tune to the eval, publish a blog post. Benchlist puts every score behind a cryptographic proof anyone can re-check, on Ethereum, forever.”
Watch the complete lifecycle, queue, run, commit, prove, batch, settle on mainnet, in under five seconds. Real SHA-256 commitment computed in your browser.
A rolling seven-day digest of every attestation that landed on-chain. Unedited, unspun, computed live from the same JSON the registry serves.
Four reasons every benchmark claim needs one.
Every other board runs on trust-me. Benchlist is the cryptographic signature on top.
Shopping for AI? Get signed quotes. Selling AI? Wear the Certified seal.
Describe your use case. We match you to vendors whose signed scores on your must-have benchmarks are freshest. Free forever for buyers; vendors pay us per qualified intro.
Request a quote →Quarterly re-attestation on a canonical suite. Seal + embed badge. Priority matching on /quotes. Free dispute coverage. Buyers look for the seal.
Get certified →From frontier LLMs to vector search, every listing comes with attested benchmark results.
The whole chain is open. You can replay any run bit-for-bit on your own hardware.
/verify/<id>; anyone replays it for $0.50. Optional Aligned Layer ZK anchor on Ethereum L1 for publishers who opt in.Benchlist uses Aligned Layer, a proof aggregation network on Ethereum, so any claim on this site is a signed, on-chain attestation. Read the integration spec →