A CLI, three SDKs, one canonical run format. Ship an attested score in about three minutes.
Posting a test is a single HTTP request. No CLI required, no uploads, no dashboard to open. You call POST /v1/run with a (service, model, benchmark) tuple; we run it on a staked attestor, commit the Merkle root, submit the proof to Aligned Layer, and settle on Ethereum L1. The response includes a verify_url that’s live within ~3 minutes of proof verification.
curl -X POST https://api.benchlist.ai/v1/run \
-H "Authorization: Bearer $BENCHLIST_KEY" \
-H "Content-Type: application/json" \
-d '{
"service": "anthropic-claude",
"model": "claude-opus-4-7",
"benchmark": "mbpp",
"runs": 3
}'
# → 202 Accepted
# {
# "run_id": "run-8f3a...",
# "status": "queued",
# "est_seconds": 180,
# "charge": { "credits": 1, "usd": 5.00 },
# "verify_url": "https://benchlist.ai/verify/run-8f3a..."
# }
That’s the whole flow. The response updates its status as it moves queued → running → committed → proving → verified. Subscribe to a webhook on run.verified to ship badges or trigger downstream jobs. $5 per run deducts from your credit balance, including the Ethereum mainnet gas your proof settles under.
Prefer a typed SDK? We ship pip install benchlist, npm i @benchlist/sdk, and a Go client, see /sdk. Prefer a CLI for CI wiring? Keep reading.
Free to sign up. Email verification only, no card on signup, no activation fee, no subscription. Drop your email at /submit and we mail you a bl_live_… Bearer key. Load credits when you're ready to sign (packs from $25 / 6 tests).
$5 per attested test. Top up whenever with a credit pack (from $25 / 6 tests, up to 33% off at volume). Two payment paths, same outcome:
# Export the key and re-use anywhere
export BENCHLIST_KEY=bl_live_...
Rotate keys with POST /v1/keys/rotate. Issue scoped sub-keys per environment. Full auth reference: /api#auth.
If you want to run your own attestor (not required, Benchlist operates one for you), pick a proof system per run via --system sp1 | risc0 | signed. Both SP1 and Risc0 are first-class, the same run.json + Merkle commitment + Aligned Layer batch verification path works for either. The runner selects a prover in this order per system:
SP1_PROVER_URL + SP1_API_KEY (Succinct Prover Network) or BONSAI_API_URL + BONSAI_API_KEY (Risc0 Bonsai). ~$1-3 per proof, no GPU needed.sp1-prover (curl -L https://sp1.succinct.xyz | bash && sp1up) or r0vm from Risc0, generates proofs on your own NVIDIA GPU (RTX 4090 / 5090 / A100 / H100).# pick the prover per run
python runner/benchlist.py prove run.json --system sp1 # default zk path
python runner/benchlist.py prove run.json --system risc0 # risc0 zkvm
python runner/benchlist.py prove run.json --system signed # ed25519 fallback
See prove-local-vs-remote for the break-even math.
When a full ZK proof isn't worth the latency (nightly regression runs, small pilots), use Ed25519 attestation:
pip install pynacl
python runner/benchlist.py prove run.json --system signed
# On-chain anchor (optional, self-send calldata tx, ~$0.50 gas)
export ATTESTOR_ETH_RPC=https://eth-mainnet.g.alchemy.com/v2/<KEY>
export ATTESTOR_PRIVATE_KEY=0x...
python runner/benchlist.py submit run.json --network ethereum
# Anyone can replay the signature locally (no server required)
python runner/benchlist.py verify run.json
The same check runs in-browser on /verify/:id via @noble/ed25519, click "Verify Ed25519 signature" on any signed-attestation run.
The reference runner is a single pipx-installable Python package. It wraps the benchmark runner, the committer (Merkle/hash), and the Aligned submitter.
pipx install benchlist-runner
# OR npm global
npm i -g @benchlist/cli
Verify:
benchlist --version
# benchlist-runner 1.0.2 (sp1 v4.2.3, aligned-sdk v2.1.0)
Say you want to benchmark your LLM provider on MBPP.
export ANTHROPIC_API_KEY=sk-ant-...
benchlist run mbpp \
--service anthropic-claude \
--model claude-opus-4-7 \
--runs 3 \
--out claude-mbpp.json
The runner will:
~/.benchlist/datasets/)run.jsonTwo options. CLI publishes directly; web lets you paste JSON.
benchlist commit claude-mbpp.json
benchlist prove claude-mbpp.json --system sp1 # OR --system signed
benchlist submit claude-mbpp.json --network ethereum
# → batch_id: 0x3c5d...9a1b (waiting for verification...)
# → verified at block 22184921
benchlist verify claude-mbpp.json # replay locally
benchlist publish claude-mbpp.json # transcripts stripped by default
# → https://benchlist.ai/verify/run-claude-mbpp-001
Add --with-transcripts to publish if you want the full transcripts hosted (larger payload, prompts visible).
Paste the output of benchlist prove into /submit. We verify the proof against Aligned's batch explorer and publish within 2 minutes.
A service is an AI-adjacent product: an LLM API, a memory substrate, a code agent, a vector DB, etc. Each service has a stable ID (slug), a category, metadata, and a JSON schema.
Services don't host benchmark runs directly, runs reference the service by ID. This lets you update the service description or URL without invalidating historical scores.
A benchmark suite is defined by two hashes:
datasetHash: SHA-256 of the canonical evaluation setmethodologyHash: SHA-256 of the runner repo at a specific commitChange either, and you've created a new version of the benchmark. Old runs don't transfer. This prevents silent benchmark drift.
A run is a specific (service, model, config) executed against a specific benchmark suite. Every run produces:
The commitment is what actually gets signed and submitted to Aligned.
An attestor is a runner that executes benchmarks and signs results. The reference attestor (benchlist-runner-0) is operated by Benchlist itself, but anyone can join the registry by:
benchlist attestor init, generates an Ed25519 keypairPUT /attestors request with their pubkey + metadataMisconduct (upheld disputes) slashes the stake.
Aligned is a proof aggregation network that settles on Ethereum L1. Every commitment produced by a runner is packaged as a proof, submitted to Aligned's operator set, and verified on-chain. Once verified, the batch ID becomes the listing's credential.
See the integration spec for wire format.