Local model attestations

Open-weight models, attested on your hardware.

Every score on this page came from a local Ollama daemon — Q4_K_M quants on a consumer GPU — running the same canonical sample sets as the cloud-API attestations. Each result is Ed25519-signed by the publisher's local attestor and replayable in your browser.

Local runs
Ed25519 signed
Models
open-weight, quantized
Benchmarks
deterministic graders
Perfect scores
100s across the matrix

The trust ladder

Benchlist accepts scores from four sources, each with a different trust level. Hover any pill on a leaderboard to see why.

Verified⛓ZK proof verified on Ethereum L1 via Aligned Layer. Highest trust — anyone can re-execute the proof.
AttestedEd25519 signature over a Merkle commitment of every transcript. Replay the signature in your browser.
LocalRun on the publisher's own hardware via Ollama or vLLM. Same canonical sample set as cloud runs. Signed.
Self-reportedVendor-disclosed number from a model card or paper. Not cryptographically verified by Benchlist.

Local model leaderboard

Average score across each model's set of attested benchmark runs. Smaller models are not scored on a curve — these are honest numbers.

#ModelRunsAvg scorePerfect
Loading…

Cross-source comparison by benchmark

For each benchmark we attested locally, see local-attested results next to cloud-attested results and (where applicable) self-reported vendor numbers. Bars are tinted by trust source.

Loading…

Reproduce locally

Run the same canonical sample sets on your own GPU. Same Ed25519 attestor scheme; results post to /v1/store-run and land in this same registry.

# Pull the runner
git clone https://github.com/benchlist/runner
cd runner

# Run a model you have via Ollama
ollama pull mistral:latest
BENCHLIST_KEY=bl_live_... python3 _local_runner.py \
  --models mistral-7b-q4km \
  --benches gsm8k,mmlu-pro,arc-challenge,piqa,bbh \
  --limit 3