Proof Packs

Overview

Aspect	Details
Purpose	Hardware-agnostic validation runs that bundle reports into portable evidence artifacts.
Audience	CI operators producing validation evidence across GPU topologies.
Requires	GPU capable of fitting selected models; HF cache or network for model download.
Outputs	Proof pack directory with reports, reports, checksums, and optional GPG signature.
Source of truth	`scripts/proof_packs/run_suite.sh`, `scripts/proof_packs/run_pack.sh`.

Proof packs are hardware-agnostic validation runs that bundle InvarLock reports, summary reports, and verification metadata into a portable evidence artifact. They replace the B200-specific validation harness with a suite that can run on any NVIDIA GPU topology that can fit the selected models.

By default, a proof pack is evidence-grade (integrity + report verification). Treat it as proof-grade only when the manifest is signed, the pack is verified in strict verification mode, and the final verdict is PASS.

Operationally, proof packs are a maintainer smoke test that also emits reusable evidence data. The same run should let maintainers catch regressions, let third parties verify reported outcomes, and provide structured outputs for downstream analysis.

Terminology: the proof-pack suite includes a run-scoped Preset Derivation phase (CALIBRATION_RUN -> GENERATE_PRESET) that writes calibrated_preset_<model>.yaml/json for that suite run. It does not directly modify global runtime/tiers.yaml. For global tier policy tuning, use invarlock calibrate ... (see Tier Policy Tuning CLI).

Entrypoint Guide

Script	Purpose	Output	Use When
`run_pack.sh`	Full proof pack: runs suite + packages artifacts	Proof pack directory with manifest + checksums	Default: distributable validation evidence
`run_suite.sh`	Suite execution only	Reports + certs under the run directory	Development/debugging, iterative runs
`verify_pack.sh`	Validate an existing proof pack	Verification status	Validating received proof packs

Quick Start

# RECOMMENDED: Full proof pack with verification artifacts
PACK_TUNED_EDIT_PARAMS_FILE=./scripts/proof_packs/tuned_edit_params.json \
  ./scripts/proof_packs/run_pack.sh --suite subset --net 1

# Development/debugging only (runs the suite, but does not build a proof pack)
./scripts/proof_packs/run_suite.sh --suite subset --resume

# Verify an existing proof pack
./scripts/proof_packs/verify_pack.sh --pack ./proof_pack_runs/subset_20250101_000000/proof_pack

Note: clean edits require tuned preset parameters. Either set PACK_TUNED_EDIT_PARAMS_FILE or place the file at scripts/proof_packs/tuned_edit_params.json.

How It Works

This page focuses on running proof packs. For the internal task graph, scheduler flow, and artifacts, see Proof Pack Internals.

Suites

Model suites live in scripts/proof_packs/suites.sh. You can also override individual models via MODEL_1–MODEL_8.

Suite	Models	Notes
`subset`	`mistralai/Mistral-7B-v0.1`	Single-GPU friendly
`showcase`	7B–14B ungated models	Multi-GPU recommended; adds guard-focused scenarios
`workshop3`	7B–32B ungated models	Workshop-friendly 3-model suite (architecture diversity)
`full`	7B–72B ungated models	Multi-GPU recommended

Scenario selection is driven by scripts/proof_packs/scenarios.json. Scenarios can optionally declare suites: ["subset", "showcase", "full", ...]; during execution the suite writes the effective (filtered) manifest to OUTPUT_DIR/state/scenarios.json, and both task generation and final verdict compilation use that state manifest.

Network & Model Revisions

Proof packs require pinned model revisions for reproducibility:

Use --net 1 on the first run to preflight and pin revisions in OUTPUT_DIR/state/model_revisions.json.
Offline runs use --net 0 (default) and error if the cache is missing.
The PACK_NET environment variable is exported as 1 or 0 to gate HF_*_OFFLINE settings.

Output Layout

A suite run writes artifacts under OUTPUT_DIR (default: ./proof_pack_runs/<suite>_<timestamp>):

reports/final_verdict.txt + reports/final_verdict.json
reports/category_summary.json
reports/guard_signal_summary.json
reports/guard_intervention_summary.json (non-failing remediation signals, e.g. spectral caps + VE probe)
reports/scenario_signal_summary.json
analysis/determinism_repeats.json (when --repeats is used)
*/reports/**/evaluation.report.json

run_pack.sh copies curated artifacts into a pack directory (default OUTPUT_DIR/proof_pack) and organizes them as:

results/final_verdict.txt + results/final_verdict.json
results/**/category_summary.json, results/**/guard_signal_summary.json, results/**/guard_intervention_summary.json, results/**/scenario_signal_summary.json
results/**/determinism_repeats.json (if present)
certs/<model>/<edit>/<run>/evaluation.report.json
certs/**/rmt_probe.json (optional sidecar; emitted by some scenarios, e.g. rmt_norm_noise)
certs/**/ve_probe.json (optional sidecar; emitted by VE demo scenarios, e.g. ve_mlp_scale_skew)
certs/**/evaluation.html + certs/**/verify.json
README.md, manifest.json, checksums.sha256
manifest.json.asc if GPG signing is available

Edit Provenance Labels

reports record the edit algorithm used:

Label	When to Use
`noop`	Baseline model with no edit applied
`quant_rtn`, `magnitude_prune`, etc.	Using InvarLock's built-in edit functions
`custom`	BYOE (Bring-Your-Own-Edit) pre-edited models

For BYOE workflows, use --edit-label custom or let InvarLock infer from the model path.

Determinism

Use --determinism strict to disable TF32 and cuDNN benchmarks and align with strict InvarLock presets. --repeats N reruns a single edit N times and records a drift summary in results/determinism_repeats.json.

Signing & Verification (Evidence vs Proof-Grade)

manifest.json includes checksums_sha256_digest (sha256 of checksums.sha256) so a signed manifest cryptographically binds the checksums file (and thus all hashed artifacts). Signed packs also record signing_key_fingerprint for audit trails.

The manifest contract is published at contracts/proof_pack_manifest.schema.json. verify_pack.sh validates this schema before checksum and signature verification so malformed proof packs fail deterministically.

Use verify_pack.sh:

Default: scripts/proof_packs/verify_pack.sh --pack <dir>
- Verifies checksums_sha256_digest, validates checksums.sha256, and runs invarlock verify.
- Warns (but does not fail) if the pack is unsigned; this is evidence-grade verification.
Strict (recommended for distributable evidence): scripts/proof_packs/verify_pack.sh --pack <dir> --strict
- Fails if manifest.json.asc is missing, gpg verification fails, or extra files exist outside checksums.sha256.
- Alternative: set PACK_STRICT_MODE=1 (e.g., PACK_STRICT_MODE=1 scripts/proof_packs/verify_pack.sh --pack <dir>).

For proof-grade attestation, require all three: signed manifest, strict verification, and PASS final verdict.

To skip signing during pack creation, set PACK_GPG_SIGN=0. To require signing, set PACK_STRICT_MODE=1.