Wire format, canonical JSON, Merkle construction, signature scheme, on-chain submission. Everything needed to build a compatible attestor or verifier.
INIT → EXECUTE → COMMIT → PROVE → SUBMIT → VERIFIED
↓ ↓ ↓ ↓ ↓
run.json run.json run.json run.json run.json
+seed +tx +commit +proof +verif
All hashing is over the canonical JSON: UTF-8, keys sorted, no whitespace, no trailing newline. Use canonicaljson-spec.
def canon(obj):
return json.dumps(obj, sort_keys=True, separators=(",", ":"), ensure_ascii=False).encode('utf-8')
digest = sha256(canon(run)).hex()
Each (prompt, response, judgement) tuple is a leaf. Leaves are hashed with sha256(0x00 || canon(tuple)). Internal nodes with sha256(0x01 || left || right). Odd layers duplicate the last node.
leaf_i = sha256(0x00 || canon({ "i": i, "prompt": ..., "response": ..., "judge": ... }))
node = sha256(0x01 || left || right)
root = sha256(0x01 || ... top of tree ...)
The commitment is the input to the ZK proof. Binds score to dataset, methodology, transcripts.
commitment = sha256(
datasetHash ||
methodologyHash ||
transcriptMerkleRoot ||
u64_be(score_fixed_point) // score × 1e6, to avoid floats
)
Attestors sign the commitment with Ed25519.
sig = ed25519_sign(attestor_sk, commitment)
// 64-byte signature; publish hex-encoded in run.attestorSignature
The ZK proof asserts: "given dataset D and methodology M, applying the pinned scoring function to the committed transcripts yields score S". Supported systems:
sp1 — RISC-V zkVM. Recompile scoring function to RV32IM. Default.risc0 — RISC-V zkVM. GPU-accelerated prover.groth16_bn254 — Classic SNARK. Smallest proof. Hand-written circuit.halo2_kzg / halo2_ipa — PLONKish, transparent setup.plonk — Universal setup. Custom circuits.signed-attestation — Fallback: attestor signature verified by Aligned's general attestation contract.Proofs are submitted via Aligned SDK. The batcher aggregates proofs and submits a Merkle root of batched proofs to the ServiceManager contract on Ethereum L1.
from aligned_sdk import AlignedClient
client = AlignedClient(network="ethereum")
batch = client.submit_proof(
proof_bytes=proof,
public_input=commitment,
proving_system="sp1",
verifier_identifier="benchlist-v1"
)
# batch.id is the credential we show on the listing
Anyone can verify:
/api/runs/:id.jsondatasetHash, methodologyHash, transcriptMerkleRoot, and scoreBatchVerifier for the batch — it returns the list of verified public inputsEvery run's replay.command MUST produce a score within the reported σ on the reference hardware. Disputes are won by violating this promise.
Breaking changes to the wire format require a new top-level spec_version in run.json. Old versions remain verifiable forever.
All of the above, working, at github.com/benchlist/runner. MIT, 2100 lines of Rust + 400 lines of Python glue.