Benchlist Certified.

An annual seal for AI products that maintain fresh, cryptographically signed benchmark scores across a canonical suite. Auto re-attested quarterly, published on-chain. Display the seal on your model card, docs, and landing page, buyers look for it.

Subscription auto-renews annually. If you cancel, the seal is revoked from your listings at the end of the current period (no retroactive removal). First upheld dispute per year = free dispute coverage (we cover replay cost, not the dispute bond itself).

Quarterly re-attestation

We run the canonical suite every 90 days. Results are Ed25519-signed and anchored on Ethereum. No manual work.

Certified seal

SVG + PNG badge tied to your freshest signed run. Embed on your site, model card, README, OG image. Auto-updates.

Priority matching

Certified vendors are matched first on /quotes briefs that overlap their suite.

Dispute coverage

First two valid disputes per year on a Certified run are resolved free (we cover attestor replay cost).

Five benchmarks, picked for your category.

Each category's canonical suite is fixed at enrollment and renewed annually with vendor consent. Example picks:

  • Coding agents: SWE-Bench Verified · HumanEval · MBPP · LiveCodeBench · GPQA
  • Long-context LLMs: LongMemEval · NIAH · RULER · InfiniteBench · MMLU-Pro
  • General frontier LLMs: MMLU-Pro · GSM8K · MATH-500 · GPQA · SimpleQA
  • Memory providers: LongMemEval · NIAH · episodic recall · consolidation depth · retrieval p50/p99
  • Voice / speech: LibriSpeech WER · VoxPopuli · CommonVoice · multilingual-BLEU · latency
Questions? Email us · quarterly re-attestation included · cancel anytime
Upheld dispute during the year?

If an upheld dispute shows your certified score was wrong, we flag the run, rerun under your witness, and (if a systematic issue) refund the certification pro-rated for the remaining quarters. Honesty first.