We charge $5 per attested test on standard benchmarks. This page shows the actual cost columns: inference, attestor compute, ZK proof generation, mainnet gas, hosting, margin. Per-benchmark where it varies. Iteration levers where we can squeeze it down.
Benchlist runs the benchmark against your model provider key (Anthropic, OpenAI, OpenRouter, etc.). The inference cost is billed directly by that provider to your account, we never touch it. Our $5 is everything else: attestor execution, Merkle commit, ZK proof generation, Aligned Layer submission, mainnet gas, IPFS hosting, platform ops.
So the $5 breakdown is our operational stack only, not your inference spend. A “standard” test = ~1,000 problems, SP1 proof, Aligned Layer batch of 32.
| Line item | Cost | Paid to | Notes |
|---|---|---|---|
| Attestor compute (CPU + I/O + scoring) | $0.40 | Attestor operator | Amortized over 50 runs/mo per node |
| ZK proof generation (SP1) | $1.50 | Attestor operator | SP1 prover, 100M cycles · RTX 5090-class |
| Ethereum L1 gas (Aligned batch) | $1.30 | Ethereum miners | $42 batch / 32 runs · scales w/ base fee |
| IPFS + edge hosting | $0.10 | Pinata / Vercel | Transcripts + dataset mirror, amortized |
| Platform margin | $1.70 | Benchlist (Slopshop Inc.) | Team, support, runner + SDK maintenance |
| Total Benchlist fee | $5.00 | Your inference bill is separate |
Example: You run HumanEval against Claude Sonnet 4.5 with your own API key. Anthropic bills you ~$0.05 for the inference (their rate, their bill). We bill you $5 for the attested execution + proof + on-chain submission. Total out of pocket: $5.05.
Since inference is billed by your own model provider, our fee varies only with attestor compute complexity and proof size. Three tiers:
We used to quote $15-$50 for complex suites to cover inference too, no longer. With your-own-key inference, complex suites are Tier 3 flat $25 from us, whatever your model provider charges separately.
The “Your inference” column is a rough estimate at Sonnet 4.5 rates; your actual bill depends on which model you pick. “Benchlist fee” is what we charge.
| Benchmark | Problems | Your inference (~Sonnet) | Benchlist fee | Total out-of-pocket | Tier |
|---|---|---|---|---|---|
| HumanEval | 164 | ~$0.05 | $5 | ~$5.05 | Standard |
| MBPP | 974 | ~$0.30 | $5 | ~$5.30 | Standard |
| MMLU-Pro | 12,032 | ~$2.40 | $5 | ~$7.40 | Standard |
| GSM8K | 1,319 | ~$0.40 | $5 | ~$5.40 | Standard |
| GPQA Diamond | 448 | ~$0.50 | $5 | ~$5.50 | Standard |
| IFEval | 541 | ~$0.35 | $5 | ~$5.35 | Standard |
| FRAMES | 824 | ~$3.60 | $5 | ~$8.60 | Standard |
| LongMemEval | 500 | ~$1.60 | $10 | ~$11.60 | Long-ctx |
| NIAH (128k ctx) | 20 | ~$0.80 | $10 | ~$10.80 | Long-ctx |
| RULER | 2,600 | ~$4.00 | $10 | ~$14.00 | Long-ctx |
| τ-Bench | 230 trajectories | ~$12.00 | $25 | ~$37.00 | Agent |
| SWE-bench Lite | 300 | ~$14.00 | $25 | ~$39.00 | Agent |
| SWE-bench Verified | 500 | ~$28.00 | $25 | ~$53.00 | Agent |
| WebArena | 812 | ~$38.00 | $25 | ~$63.00 | Agent |
Your inference estimate is at Claude Sonnet 4.5 rates ($3 in / $15 out per 1M tokens). Opus or o1-class roughly 4×; GPT-4o-mini roughly 1/6×. You see the exact charge on your provider's dashboard after the run. Our fee is fixed per tier regardless of which model you pick.
Publishers can pick a proof system. The tradeoff is prove-time cost vs. on-chain verification cost:
| Proof system | Prove time | Prove cost | Proof size | L1 verify gas | Best for |
|---|---|---|---|---|---|
| SP1 (default) | 8-18 min | $1.50 | ~1 KB | ~300k | Complex eval code, unmodified Python |
| Risc0 | 6-14 min | $1.30 | ~900 B | ~280k | GPU-heavy batching |
| Halo2 (KZG) | 25-60 min | $3.20 | ~750 B | ~220k | Post-quantum, long-horizon claims |
| Groth16-BN254 | 2-5 min | $0.80 | ~200 B | ~150k | Simple threshold/mean scoring |
| Plonk (kimchi) | 10-30 min | $2.10 | ~400 B | ~200k | Custom circuits |
| Signed attestation (fallback) | <1 s | $0.05 | 64 B | ~60k | LLM-judged benchmarks (no ZK-friendly score fn) |
Signed attestations carry no ZK guarantee but still get the attestor-stake + community-replay layers. We mark them “Attested” instead of “Verified ⛓” on the UI.
Three things move the needle, in order of leverage:
We publish these internally every month and update this page when the stack shifts. No Ethereum-gas surprise billing, if base fee triples, we eat it for in-flight runs and adjust new quotes.
Proof generation (SP1 or Risc0, both supported, picked per run via --system) is the single most capital-intensive line in the stack. Attestors have three viable paths; the cryptographic output is identical.
| Path | Setup cost | Per-proof cost | Break-even | Best for |
|---|---|---|---|---|
| Local GPU (RTX 4090 / 5090 / A100) | $1,600 – $8,000 hardware | ~$0.20 (power) | ~600 proofs | Dedicated attestors, steady volume, founder-operated |
| Succinct Prover Network (remote) | $0 | ~$1.50 | n/a | Third-party attestors without hardware; bursty load |
| Risc0 Bonsai (remote Risc0) | $0 | ~$1.80 | n/a | Publishers preferring Risc0 proof system |
Benchlist reference attestor runs local on a consumer RTX 5090, SP1 prove time ≈ 5-12 minutes per standard benchmark. Marginal cost is electricity only. Third-party attestors who don’t own a GPU set SP1_PROVER_URL + SP1_API_KEY to outsource proving; the runner auto-detects and routes without code changes. The $1.50 SP1 line item in the main cost table assumes remote proving as a conservative upper bound; local-prove attestors keep that margin.
Aligned Layer aggregates proofs into a single on-chain verification. The per-run gas cost is:
gas_per_run = (L1_verify_gas × gas_price + batcher_fee) / batch_size
At current mainnet pricing (~25 gwei base fee, ETH ≈ $3,600):
We default to batches of 32 during launch. The system automatically increases batch size as volume grows; users see their effective price drop accordingly (packs get cheaper per run, pay-as-you-go price stays $5 but margin improves).
Attestors earn a share of each run they process. At $5/test, the split is approximately:
An attestor break-even at current pricing is ~50 runs/month per node, assuming a GPU amortized over 36 months. Once fleet demand pushes an attestor above 200 runs/month, they become meaningfully profitable at these rates.
Operator guide + join flow: /docs#attestors.
We get this question a lot. The honest answer: Ethereum L1 settlement is the floor. The verification contract on mainnet costs gas we don’t control. A proof batch that doesn’t land on L1 isn’t a Benchlist proof by definition.
Competitors who charge <$1 per “verified” test are either:
We prefer to be expensive and honest. For use cases that don’t need mainnet directness, the “Signed attestation” fallback above exists at $0.05 amortized.
SWE-bench, τ-Bench, WebArena, and anything requiring sandboxed execution, browser automation, or multi-hour agent trajectories are outside the “standard” cost envelope. These are quoted up-front before any run starts.
Typical quotes:
These are posted publicly the same way simple suites are. The $5/test default is for “green” rows on the matrix above.
We re-run this cost table the first of every month with fresh numbers from the attestor fleet. If costs drop, prices drop. If costs rise, we flag it here before changing pricing. The audit trail is in /changelog.