Free anonymous probe

Run a benchmark. Get a signed receipt.

No signup. No card. Pick a benchmark, pick a model, hit run. We'll execute 3 canonical samples on the public inference pool, sign the result with our attestor, and post it to a permanent /verify/<id> URL. The exact same shape as a paid n=50 attestation, just smaller. Bring your own provider key for higher sample counts (n=50, $5 per test).

Or use the API directly

One-liner — same probe, from your terminal

curl -sS -X POST https://benchlist.ai/api/v1/probe \
  -H "Content-Type: application/json" \
  -d '{"benchmark":"gsm8k","model":"anthropic/claude-haiku-4-5","n":3}'

What just happened.

Your probe ran the model on 3 deterministic samples from the canonical HuggingFace dataset. We graded the output, computed a Wilson 95% CI, signed the receipt with our Ed25519 attestor, and posted it to the public registry. Anyone can hit /api/runs.json and find your run. Anyone can replay it for $0.50 to confirm.

Free

Anonymous probe

n=3, our inference cost. Public receipt. Rate-limited.

$5 / test

Pay-as-you-go

n=50 across canonical sample. Free key + POST /v1/run with your provider key.

$99 one-shot

Launch certificate

8 benchmarks at n=50 + SVG badge. For model labs →

$499 / mo

Provider Verified

Unlimited multi-model + drift alerts. For inference providers →

For AI agents

Calling from Claude Code, Cursor, or your own agent?

Read /llms.txt for the full agent integration spec, or hit /openapi.json for OpenAPI 3.1.

Read /llms.txt →