Free anonymous probe

Run a benchmark. Get a signed receipt.

No signup. No card. Pick a benchmark, pick a model, hit run. We'll execute 3 canonical samples on the public inference pool, sign the result with our attestor, and post it to a permanent /verify/<id> URL. The exact same shape as a paid n=50 attestation, just smaller. Bring your own provider key for higher sample counts (n=50, $5 per test).

Probe runs n=3 samples on our pool. Rate-limited to 1 per hour, 5 per day, per IP.

Or use the API directly
One-liner — same probe, from your terminal
curl -sS -X POST https://benchlist.ai/api/v1/probe \
  -H "Content-Type: application/json" \
  -d '{"benchmark":"gsm8k","model":"anthropic/claude-haiku-4-5","n":3}'

What just happened.

Your probe ran the model on 3 deterministic samples from the canonical HuggingFace dataset. We graded the output, computed a Wilson 95% CI, signed the receipt with our Ed25519 attestor, and posted it to the public registry. Anyone can hit /api/runs.json and find your run. Anyone can replay it for $0.50 to confirm.

Free
Anonymous probe
n=3, our inference cost. Public receipt. Rate-limited.
$5 / test
Pay-as-you-go
n=50 across canonical sample. Free key + POST /v1/run with your provider key.
$99 one-shot
Launch certificate
8 benchmarks at n=50 + SVG badge. For model labs →
$499 / mo
Provider Verified
Unlimited multi-model + drift alerts. For inference providers →
For AI agents
Calling from Claude Code, Cursor, or your own agent?

Read /llms.txt for the full agent integration spec, or hit /openapi.json for OpenAPI 3.1.

Read /llms.txt →