Proof Packs

Overview

AspectDetails
PurposeHardware-agnostic validation runs that bundle reports into portable evidence artifacts.
AudienceCI operators producing validation evidence across GPU topologies.
RequiresGPU capable of fitting selected models; HF cache or network for model download.
OutputsProof pack directory with reports, reports, checksums, and optional GPG signature.
Source of truthscripts/proof_packs/run_suite.sh, scripts/proof_packs/run_pack.sh.

Proof packs are hardware-agnostic validation runs that bundle InvarLock reports, summary reports, and verification metadata into a portable evidence artifact. They replace the B200-specific validation harness with a suite that can run on any NVIDIA GPU topology that can fit the selected models.

By default, a proof pack is evidence-grade (integrity + report verification). Treat it as proof-grade only when the manifest is signed, the pack is verified in strict verification mode, and the final verdict is PASS.

Operationally, proof packs are a maintainer smoke test that also emits reusable evidence data. The same run should let maintainers catch regressions, let third parties verify reported outcomes, and provide structured outputs for downstream analysis.

Terminology: the proof-pack suite includes a run-scoped Preset Derivation phase (CALIBRATION_RUN -> GENERATE_PRESET) that writes calibrated_preset_<model>.yaml/json for that suite run. It does not directly modify global runtime/tiers.yaml. For global tier policy tuning, use invarlock calibrate ... (see Tier Policy Tuning CLI).

Entrypoint Guide

ScriptPurposeOutputUse When
run_pack.shFull proof pack: runs suite + packages artifactsProof pack directory with manifest + checksumsDefault: distributable validation evidence
run_suite.shSuite execution onlyReports + certs under the run directoryDevelopment/debugging, iterative runs
verify_pack.shValidate an existing proof packVerification statusValidating received proof packs

Quick Start

# RECOMMENDED: Full proof pack with verification artifacts
PACK_TUNED_EDIT_PARAMS_FILE=./scripts/proof_packs/tuned_edit_params.json \
  ./scripts/proof_packs/run_pack.sh --suite subset --net 1

# Development/debugging only (runs the suite, but does not build a proof pack)
./scripts/proof_packs/run_suite.sh --suite subset --resume

# Verify an existing proof pack
./scripts/proof_packs/verify_pack.sh --pack ./proof_pack_runs/subset_20250101_000000/proof_pack

Note: clean edits require tuned preset parameters. Either set PACK_TUNED_EDIT_PARAMS_FILE or place the file at scripts/proof_packs/tuned_edit_params.json.

How It Works

This page focuses on running proof packs. For the internal task graph, scheduler flow, and artifacts, see Proof Pack Internals.

Suites

Model suites live in scripts/proof_packs/suites.sh. You can also override individual models via MODEL_1MODEL_8.

SuiteModelsNotes
subsetmistralai/Mistral-7B-v0.1Single-GPU friendly
showcase7B–14B ungated modelsMulti-GPU recommended; adds guard-focused scenarios
workshop37B–32B ungated modelsWorkshop-friendly 3-model suite (architecture diversity)
full7B–72B ungated modelsMulti-GPU recommended

Scenario selection is driven by scripts/proof_packs/scenarios.json. Scenarios can optionally declare suites: ["subset", "showcase", "full", ...]; during execution the suite writes the effective (filtered) manifest to OUTPUT_DIR/state/scenarios.json, and both task generation and final verdict compilation use that state manifest.

Network & Model Revisions

Proof packs require pinned model revisions for reproducibility:

  • Use --net 1 on the first run to preflight and pin revisions in OUTPUT_DIR/state/model_revisions.json.
  • Offline runs use --net 0 (default) and error if the cache is missing.
  • The PACK_NET environment variable is exported as 1 or 0 to gate HF_*_OFFLINE settings.

Output Layout

A suite run writes artifacts under OUTPUT_DIR (default: ./proof_pack_runs/<suite>_<timestamp>):

  • reports/final_verdict.txt + reports/final_verdict.json
  • reports/category_summary.json
  • reports/guard_signal_summary.json
  • reports/guard_intervention_summary.json (non-failing remediation signals, e.g. spectral caps + VE probe)
  • reports/scenario_signal_summary.json
  • analysis/determinism_repeats.json (when --repeats is used)
  • */reports/**/evaluation.report.json

run_pack.sh copies curated artifacts into a pack directory (default OUTPUT_DIR/proof_pack) and organizes them as:

  • results/final_verdict.txt + results/final_verdict.json
  • results/**/category_summary.json, results/**/guard_signal_summary.json, results/**/guard_intervention_summary.json, results/**/scenario_signal_summary.json
  • results/**/determinism_repeats.json (if present)
  • certs/<model>/<edit>/<run>/evaluation.report.json
  • certs/**/rmt_probe.json (optional sidecar; emitted by some scenarios, e.g. rmt_norm_noise)
  • certs/**/ve_probe.json (optional sidecar; emitted by VE demo scenarios, e.g. ve_mlp_scale_skew)
  • certs/**/evaluation.html + certs/**/verify.json
  • README.md, manifest.json, checksums.sha256
  • manifest.json.asc if GPG signing is available

Edit Provenance Labels

reports record the edit algorithm used:

LabelWhen to Use
noopBaseline model with no edit applied
quant_rtn, magnitude_prune, etc.Using InvarLock's built-in edit functions
customBYOE (Bring-Your-Own-Edit) pre-edited models

For BYOE workflows, use --edit-label custom or let InvarLock infer from the model path.

Determinism

Use --determinism strict to disable TF32 and cuDNN benchmarks and align with strict InvarLock presets. --repeats N reruns a single edit N times and records a drift summary in results/determinism_repeats.json.

Signing & Verification (Evidence vs Proof-Grade)

manifest.json includes checksums_sha256_digest (sha256 of checksums.sha256) so a signed manifest cryptographically binds the checksums file (and thus all hashed artifacts). Signed packs also record signing_key_fingerprint for audit trails.

The manifest contract is published at contracts/proof_pack_manifest.schema.json. verify_pack.sh validates this schema before checksum and signature verification so malformed proof packs fail deterministically.

Use verify_pack.sh:

  • Default: scripts/proof_packs/verify_pack.sh --pack <dir>
    • Verifies checksums_sha256_digest, validates checksums.sha256, and runs invarlock verify.
    • Warns (but does not fail) if the pack is unsigned; this is evidence-grade verification.
  • Strict (recommended for distributable evidence): scripts/proof_packs/verify_pack.sh --pack <dir> --strict
    • Fails if manifest.json.asc is missing, gpg verification fails, or extra files exist outside checksums.sha256.
    • Alternative: set PACK_STRICT_MODE=1 (e.g., PACK_STRICT_MODE=1 scripts/proof_packs/verify_pack.sh --pack <dir>).

For proof-grade attestation, require all three: signed manifest, strict verification, and PASS final verdict.

To skip signing during pack creation, set PACK_GPG_SIGN=0. To require signing, set PACK_STRICT_MODE=1.