InvarLock Quickstart Guide

Overview

AspectDetails
PurposeComplete the core evaluation workflow in a few commands.
AudienceNew users running their first evaluation.
Requirespip install invarlock for verify/report/evidence-pack flows; add invarlock[hf] only for Hugging Face-backed evaluate.
NetworkUse --allow-network on evaluate when a run needs model or dataset downloads.
Next stepCompare & evaluate for production use.

This guide keeps the public front door first: evaluate, verify, and report html. The default path produces a machine-readable evaluation report. The minimal install is enough for verification, report rendering, and evidence-pack inspection. Add invarlock[hf] only when you want the evaluate path to load Hugging Face models. Reach for report generate and report explain after the core path is already green.

If any terms are unfamiliar, see the Glossary.

Quick Start

1. Prepare the environment

pip install invarlock

# Optional: only for evaluate with Hugging Face-backed models
pip install "invarlock[hf]"

invarlock doctor

Wheel-only review path: invarlock verify /path/to/evaluation.report.json, invarlock report html -i /path/to/evaluation.report.json -o /path/to/evaluation.html, and invarlock report explain --evaluation-report /path/to/evaluation.report.json.

2. Evaluate a baseline against a subject

INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject distilgpt2 \
  --adapter auto \
  --profile ci \
  --report-out reports/eval

evaluate uses the runtime container by default unless you explicitly pass --execution-mode host for a host-side workflow. Container-backed runs emit reports/eval/runtime.manifest.json next to evaluation.report.json. For a host-side bypass, verify the resulting report with invarlock verify --runtime-provenance host ....

Evidence-pack verification works from an installed wheel and does not require a repo checkout:

invarlock advanced evidence-pack verify <pack> --strict

3. Verify the evaluation report

# Container/default evaluate output
invarlock verify reports/eval/evaluation.report.json

# Host evaluate output
invarlock verify --runtime-provenance host reports/eval/evaluation.report.json

The verifier re-checks schema, paired math, gate results, and the adjacent runtime manifest before you promote results. Use the host form only when the evaluation itself ran with --execution-mode host.

Artifact model:

ArtifactProduced byPrimary consumers
evaluation.report.jsoninvarlock evaluate, invarlock report generate --format reportinvarlock verify, invarlock report html, invarlock report validate, invarlock report explain --evaluation-report, invarlock advanced runtime-verify
report.jsonBaseline/subject run directories under runs/...invarlock report generate, invarlock report explain --subject-report ... --baseline-report ...

4. Render shareable HTML

invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html

Optional: explain gate decisions directly from the evaluation bundle with invarlock report explain --evaluation-report reports/eval/evaluation.report.json when the bundle provenance still points to accessible baseline and subject report.json files.

If you only have the run reports, the lower-level form remains: invarlock report explain --subject-report runs/subject/report.json --baseline-report runs/source/report.json.

Execution Notes

  • Enable downloads per command with --allow-network.
  • For offline reads after warming caches, use HF_DATASETS_OFFLINE=1.
  • --execution-mode host is the explicit host bypass for evaluate.
  • verify expects runtime.manifest.json for container-backed evaluation outputs.
  • --profile ci currently expands causal-LM windows to 240/240; release expands them to 400/400.

Advanced And Demo Flows

The built-in quant_rtn edit ships for demos and smoke tests, but the primary onboarding path is the default evaluate flow shown above.

INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject gpt2 \
  --adapter auto \
  --profile ci \
  --preset configs/presets/causal_lm/wikitext2_512.yaml \
  --edit-config configs/overlays/edits/quant_rtn/8bit_attn.yaml \
  --report-out reports/demo

Advanced commands live under invarlock advanced:

invarlock advanced plugins list
invarlock advanced evidence-pack verify <pack> --strict
invarlock advanced policy --help
invarlock advanced calibrate --help

Use Python extras such as pip install "invarlock[awq,gptq]" when you need optional backends. On Python 3.13+ stacks, gptq may still require a vendor wheel or a supported older interpreter because upstream auto-gptq packaging remains narrower than the core InvarLock support matrix.

Repo Maintainer Path

If you are working from a repository checkout and want the local image-backed smoke flows, build the runtime image after the basic front door works:

make runtime-image, make container-default-smoke, and make container-front-door-smoke.

Podman users can prepare the same image explicitly with: make runtime-image-podman and make runtime-smoke-podman.

Core Concepts

Workflow

  • Evaluate: compare baseline and subject with deterministic pairing
  • Verify: fail closed on malformed or missing-provenance evaluation outputs
  • Report: render HTML or explain gate decisions from existing artifacts

Guards

  • Invariants: verify structural properties are preserved
  • Spectral: check spectral norm bounds for stability
  • Variance: monitor activation variance shifts
  • RMT: apply random-matrix-theory-based validation

Devices

--device auto probes CUDA, then MPS, then CPU. Override it explicitly when validating portability or troubleshooting accelerator issues.

Next Steps

Note: presets under configs/ are repo-first assets. When using a wheel-only install, prefer direct flags instead of preset paths unless you also cloned the repository.