Paste a run. Get a second opinion.

Paste any Benchlist run URL (or run.json URL). We spin up a clean container, re-execute the exact replay command, and return an independent signed attestation of the score we saw. Two signed runs, different attestors, now the claim has compounding evidence or it doesn't.

Currently in private beta. Paid in card or ETH. ~8–18 min for a 50-problem run; result posts back to the URL you submit. Email dev@remlabs.ai for replay-pool API access.

Three cases where a second signed run is worth $0.50.

Journalists

"Is this release number real?"

Model launches often cite a new SOTA. Replay it in a clean container and publish the delta. Signed. Dated. Linkable.

Procurement

"Verify before we sign."

Enterprise buyers demand independent evidence before a 7-figure contract. A $0.50 replay is cheaper than a $50k audit.

Researchers

"I don't trust the adapter."

Re-run with our canonical container. Confirm dataset hashes. Compare against published number. Cite both signatures.

How it works under the hood

Each replay runs inside a fresh docker pull benchlist/runner:<pinned> container on a dedicated attestor node, using the inference API key you provide (or our credit balance if you pre-funded). A new Ed25519 signature is issued by that attestor and anchored. The resulting run.json links back to the original via replayOf.