One-line install

Get a signed attestation from your terminal.

Pick your runtime. Run the install line. Authenticate with one command. Post your first attested benchmark in under two minutes.

Recommended · Python

pip install

pip install benchlist-runner

Then benchlist login opens your browser, gives you a free key, stores it in ~/.benchlist/key. Standalone, zero deps beyond Python 3.10+.

Node / TypeScript

npm install

npm install -g @benchlist/cli

Same surface as the Python CLI. Wraps fetch, ships TypeScript types.

Bash · curl one-liner

curl-pipe install

curl -sSf https://benchlist.ai/install.sh | sh

Detects platform, drops a single Python binary into $HOME/.local/bin. Reads the script first if you don't trust pipe-to-shell — it's right here.

No install · pure HTTP

Just curl the API

curl -sS -X POST https://benchlist.ai/api/v1/probe \
  -H "Content-Type: application/json" \
  -d '{"benchmark":"gsm8k","model":"anthropic/claude-haiku-4-5","n":3}'

The probe endpoint is anonymous, rate-limited to 1 per IP per hour. Same shape as a paid run, n capped at 3. Result lands at verify_url in the response.

After install, the four-line flow

benchlist login                                # device-code OAuth, opens browser
benchlist demo                                 # synthesise a sample run, no inference
benchlist run gsm8k --model claude-sonnet-4-5  # real n=50 attestation, $5
benchlist verify run-abc123                    # replays the Ed25519 check locally

For AI agents (MCP)

claude mcp add benchlist python -m benchlist_runner.mcp

Exposes list_benchmarks, get_latest_signed_score, list_runs, verify_run, probe, run_demo as MCP tools. Works with Claude Code, Cursor, Continue, Cline, any MCP host. Full schema at /llms.txt.

GitHub Action

- uses: benchlist/runner/.github/actions/benchlist-attest@v1
  with:
    benchmark: swe-bench-verified
    model:     claude-sonnet-4-5-20250929
    service:   anthropic-claude
    n:         50
  env:
    BENCHLIST_KEY: ${{ secrets.BENCHLIST_KEY }}

Outputs: run-id, score, verify-url, signed. Drop into any release pipeline to auto-attest each tagged version.

Want full reference?

OpenAPI 3.1 spec + agent integration doc

/openapi.json /llms.txt /api docs →