We charge $5 per attested test on standard benchmarks. This page shows the actual cost columns: inference, attestor compute, ZK proof generation, mainnet gas, hosting, margin. Per-benchmark where it varies. Iteration levers where we can squeeze it down.
A “standard” test = ~1,000 problems (MBPP, HumanEval), 3-run average, Claude-Sonnet-class model, SP1 proof, Aligned Layer batch of 32. These are the numbers behind the headline price.
| Line item | Cost | Paid to | Notes |
|---|---|---|---|
| LLM inference (3× standard bench) | $0.80 | Model provider | Avg 1.5M tokens · Sonnet-tier rate |
| Attestor compute (CPU + I/O) | $0.40 | Attestor operator | Hardware amortized over 50 runs/mo |
| ZK proof generation (SP1) | $1.50 | Attestor operator | SP1 prover, 100M cycles · RTX 5090-class |
| Ethereum L1 gas (batched) | $1.30 | Ethereum miners | $42 batch / 32 runs · scales w/ base fee |
| IPFS + edge hosting | $0.10 | Pinata / Vercel | Transcripts + dataset mirror, amortized |
| Platform margin | $0.90 | Benchlist (Slopshop Inc.) | Team, support, runner + SDK maintenance |
| Total | $5.00 | Roughly break-even with 1 credit/pack growth; margin expands with scale |
These are our best-case steady-state numbers, not worst case. A complex suite (SWE-bench Verified, τ-Bench) has a different cost profile; see complex suites.
Not every benchmark costs the same. The inference column dominates; the others scale less aggressively. Prices below are what we internally compute; user-facing prices are $5 for anything in green and quoted separately for yellow/red.
| Benchmark | Problems | Inference | Proof gen | Gas (amortized) | Total cost | User price |
|---|---|---|---|---|---|---|
| HumanEval | 164 | $0.15 | $1.20 | $1.30 | $3.05 | $5 |
| MBPP | 974 | $0.80 | $1.50 | $1.30 | $4.10 | $5 |
| MMLU-Pro | 12,032 | $1.80 | $1.80 | $1.30 | $5.30 | $5 |
| GSM8K | 8,500 | $0.90 | $1.50 | $1.30 | $4.10 | $5 |
| LongMemEval | 500 | $1.50 | $1.80 | $1.30 | $5.10 | $5 |
| LCB (LiveCodeBench) | 400 | $0.40 | $1.50 | $1.30 | $3.70 | $5 |
| FRAMES | 1,170 | $3.60 | $1.80 | $1.30 | $7.20 | $10 |
| τ-Bench (Tau-Bench) | 230 trajectories | $12.00 | $2.40 | $1.30 | $16.10 | $20 |
| SWE-bench Verified | 500 | $28.00 | $3.20 | $1.30 | $33.50 | $50 |
| WebArena | 812 | $38.00 | $3.60 | $1.30 | $44.90 | $60 |
Inference is estimated at mid-tier API prices (Claude Sonnet / GPT-4o-mini class). For Opus-class or o1-style reasoning models the inference column roughly 4×. Complex suites quoted above include the premium.
Publishers can pick a proof system. The tradeoff is prove-time cost vs. on-chain verification cost:
| Proof system | Prove time | Prove cost | Proof size | L1 verify gas | Best for |
|---|---|---|---|---|---|
| SP1 (default) | 8-18 min | $1.50 | ~1 KB | ~300k | Complex eval code, unmodified Python |
| Risc0 | 6-14 min | $1.30 | ~900 B | ~280k | GPU-heavy batching |
| Halo2 (KZG) | 25-60 min | $3.20 | ~750 B | ~220k | Post-quantum, long-horizon claims |
| Groth16-BN254 | 2-5 min | $0.80 | ~200 B | ~150k | Simple threshold/mean scoring |
| Plonk (kimchi) | 10-30 min | $2.10 | ~400 B | ~200k | Custom circuits |
| Signed attestation (fallback) | <1 s | $0.05 | 64 B | ~60k | LLM-judged benchmarks (no ZK-friendly score fn) |
Signed attestations carry no ZK guarantee but still get the attestor-stake + community-replay layers. We mark them “Attested” instead of “Verified ⛓” on the UI.
Three things move the needle, in order of leverage:
We publish these internally every month and update this page when the stack shifts. No Ethereum-gas surprise billing — if base fee triples, we eat it for in-flight runs and adjust new quotes.
Aligned Layer aggregates proofs into a single on-chain verification. The per-run gas cost is:
gas_per_run = (L1_verify_gas × gas_price + batcher_fee) / batch_size
At current mainnet pricing (~25 gwei base fee, ETH ≈ $3,600):
We default to batches of 32 during launch. The system automatically increases batch size as volume grows; users see their effective price drop accordingly (packs get cheaper per run, pay-as-you-go price stays $5 but margin improves).
Attestors earn a share of each run they process. At $5/test, the split is approximately:
An attestor break-even at current pricing is ~50 runs/month per node, assuming a GPU amortized over 36 months. Once fleet demand pushes an attestor above 200 runs/month, they become meaningfully profitable at these rates.
Operator guide + join flow: /docs#attestors.
We get this question a lot. The honest answer: Ethereum L1 settlement is the floor. The verification contract on mainnet costs gas we don’t control. A proof batch that doesn’t land on L1 isn’t a Benchlist proof by definition.
Competitors who charge <$1 per “verified” test are either:
We prefer to be expensive and honest. For use cases that don’t need mainnet directness, the “Signed attestation” fallback above exists at $0.05 amortized.
SWE-bench, τ-Bench, WebArena, and anything requiring sandboxed execution, browser automation, or multi-hour agent trajectories are outside the “standard” cost envelope. These are quoted up-front before any run starts.
Typical quotes:
These are posted publicly the same way simple suites are. The $5/test default is for “green” rows on the matrix above.
We re-run this cost table the first of every month with fresh numbers from the attestor fleet. If costs drop, prices drop. If costs rise, we flag it here before changing pricing. The audit trail is in /changelog.