CLI Reference
Overview
| Aspect | Details |
|---|---|
| Purpose | Command-line interface for evaluation, verification, reporting, and advanced maintenance flows. |
| Audience | Operators running InvarLock from a terminal or CI. |
| Primary commands | evaluate, verify, report, doctor, advanced, version. |
| Runtime verifier | invarlock advanced runtime-verify for direct runtime manifest checks. |
| Requires | invarlock[hf] for model-loading workflows; extra backends are installed via Python extras. |
| Network | Offline by default; use evaluate --allow-network when a run needs model or dataset downloads. |
| Source of truth | src/invarlock/cli/app.py, src/invarlock/cli/commands/*.py, src/invarlock/cli/runtime_verify.py. |
Most users only need a narrow top-level surface:
invarlock evaluateinvarlock verifyinvarlock report html
Everything else is either diagnostics (doctor) or explicitly advanced
(invarlock advanced ...).
First-Touch Surfaces
These entrypoints are the ones users hit first when orienting themselves in a fresh install or wheel-only environment:
| Surface | Why it matters |
|---|---|
invarlock --help | Top-level discovery of the supported public command set |
invarlock --version | Confirms the installed package and schema pairing |
invarlock report --help | Shows the report subcommands without requiring run artifacts |
invarlock advanced --help | Lists the advanced maintenance namespace before drilling into subcommands |
invarlock advanced calibrate --help | Establishes that calibration lives under advanced rather than the core loop |
invarlock advanced runtime-verify --help | Wheel-native runtime-manifest verification for existing report bundles |
Quick Start
# Install the Hugging Face-backed evaluation stack
pip install "invarlock[hf]"
# Compare a baseline against a subject
invarlock evaluate --allow-network \
--baseline gpt2 \
--subject distilgpt2 \
--adapter auto \
--profile ci
# Validate the container-backed evaluation bundle
invarlock verify reports/eval/evaluation.report.json
# Render shareable HTML
invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html
invarlock report explain --evaluation-report reports/eval/evaluation.report.json
Security Defaults
evaluatedefaults to--execution-mode container, which delegates model-loading work into the runtime container.- Use
--execution-mode hostonly for host-side workflows that intentionally bypass the container boundary. verifyexpectsruntime.manifest.jsonbeside container-backed evaluation outputs and fails closed when required runtime provenance is missing.- Network access remains opt-in through
evaluate --allow-network.
Task To Command Map
| Task | Command | Output |
|---|---|---|
| Compare baseline vs subject | invarlock evaluate | reports/eval/evaluation.report.json plus runtime.manifest.json for container-backed runs |
| Validate an evaluation report | invarlock verify | Exit code plus human or JSON verification output |
| Render HTML from an evaluation report | invarlock report html | HTML file |
| Explain gate decisions from an evaluation bundle or explicit run reports | invarlock report explain | Human-readable explanation |
| Inspect environment health | invarlock doctor | Human or JSON diagnostics |
| Evidence-pack, policy, plugin, or calibration workflows | invarlock advanced ... | Advanced artifacts and diagnostics |
Artifact Outputs Matrix
| Command | Writes runs/ | Writes reports/ | Notes |
|---|---|---|---|
invarlock evaluate | Yes (--out, default runs/) | Yes (--report-out, default reports/eval) | Produces the paired evaluation report bundle |
invarlock verify | No | No | Reads existing evaluation report JSON |
invarlock report html | No | Yes (--output) | Renders HTML from an existing report |
invarlock report explain | No | No | Prefers evaluation.report.json, then auto-resolves linked run reports; also accepts explicit --subject-report and --baseline-report |
invarlock doctor | No | No | Diagnostics only |
invarlock advanced evidence-pack | Depends on subcommand | Depends on subcommand | Advanced evidence packaging |
invarlock advanced policy | Depends on subcommand | No | Advanced policy-pack tooling |
invarlock advanced plugins | No | No | Read-only plugin discovery and explanation |
invarlock advanced calibrate | Yes | Yes | Advanced tier-policy calibration workflows |
Top-Level Command Index
| Command | Purpose |
|---|---|
invarlock evaluate | Compare baseline and subject checkpoints with deterministic pairing |
invarlock verify | Verify evaluation reports against schema, pairing, and runtime provenance rules |
invarlock report | Explain, render, and validate existing report artifacts |
invarlock doctor | Diagnose environment and configuration issues |
invarlock advanced | Advanced evidence-pack, policy, plugin, and calibration workflows |
invarlock version | Show the installed version |
invarlock advanced runtime-verify | Verify an evaluation report against its sibling runtime.manifest.json |
Exit codes: 0=success, 1=generic failure, 2=usage/schema/config failure,
3=hard abort for profile-aware fail-closed paths.
invarlock evaluate
Purpose: compare a baseline against a subject and emit an evaluation report.
Common options:
--baseline: baseline checkpoint path or model ID--subject: subject checkpoint path or model ID--baseline-report: reuse a stored baseline report by passing the explicitreport.jsonfile path that captured the baseline windows--adapter: adapter name orauto--profile:ci,release, or another included profile--tier: tier label for policy context--preset: optional repo preset path--out: run-artifact directory--report-out: evaluation report directory--execution-mode container|host: execution policy forevaluate.containerkeeps model loading inside the runtime container;hostallows host-side execution and produces host artifacts that should be verified withverify --runtime-provenance host.--edit-config: optional demo/smoke edit overlay such asquant_rtn
Example:
INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
--baseline gpt2 \
--subject distilgpt2 \
--adapter auto \
--profile ci \
--report-out reports/eval
invarlock verify
Purpose: verify existing evaluation report JSON files.
Arguments:
REPORTS...: one or more evaluation report JSON paths or directories containing canonicalevaluation.report.json
Common options:
--baseline: optional baseline report for comparison flows--tolerance: float tolerance for recompute checks--profile: profile-aware validation mode--runtime-provenance container|host: runtime provenance policy for the supplied report artifacts--json: emit a single JSON envelope
Example:
invarlock verify --json reports/eval/evaluation.report.json
invarlock report
Purpose: operate on existing report artifacts through explicit subcommands.
Core subcommands:
invarlock report generate- Generate human-readable report output from existing run reports
- Options:
--run,--compare-run-report,--baseline-run-report,--format,--output
invarlock report html- Render an evaluation report to HTML
- Options:
-i/--input,-o/--output,--embed-css,--force
invarlock report explain- Explain gates and primary-metric behavior from the preferred evaluation bundle input, or from explicit subject/baseline run reports when needed
- Options:
--evaluation-report,--subject-report,--baseline-report
invarlock report validate- Validate a report JSON against the v1 schema
- Directory inputs are command-specific:
report generateandreport explainaccept directories containing canonicalreport.jsonreport htmlandreport validateaccept directories containing canonicalevaluation.report.jsonreport explain --evaluation-reportaccepts directories containing canonicalevaluation.report.jsonverifyaccepts directories containing canonicalevaluation.report.jsonand optional baselines containing canonicalreport.jsonorevaluation.report.json- If a directory contains both canonical filenames, it is ambiguous and rejected; pass the exact file path instead.
Example:
invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html
invarlock report explain --evaluation-report reports/eval/evaluation.report.json
invarlock report explain \
--subject-report runs/subject/report.json \
--baseline-report runs/baseline/report.json
invarlock doctor
Purpose: environment diagnostics that remain light-import safe.
Common options:
--json--profile--tier--baseline-report--subject-report--strict- Report inputs accept an explicit JSON file path or a directory containing
canonical
report.jsonorevaluation.report.json; ambiguous directories with both canonical files are rejected and require an explicit file path.
Example:
invarlock doctor --json
invarlock advanced
Purpose: advanced and maintenance-oriented workflows that are intentionally outside the core product contract.
Subcommands:
invarlock advanced evidence-pack- Inspect, build, and verify evidence packs
invarlock advanced policy- Build and verify policy-pack artifacts
invarlock advanced plugins- Read-only plugin discovery and explanation
invarlock advanced calibrate- Tier-policy calibration and sweep tooling
Examples:
invarlock advanced evidence-pack verify <pack> --strict
invarlock advanced policy verify policy-pack.json --json
invarlock advanced plugins list --json
invarlock advanced calibrate --help
Plugins & Entry Points
invarlock advanced plugins lists built-in and optional adapters, guards,
edits, datasets, and related entry points without mutating the active Python
environment.
Available read-only flows include:
invarlock advanced plugins listinvarlock advanced plugins adaptersinvarlock advanced plugins guardsinvarlock advanced plugins edits
Optional backends are installed through normal Python packaging, for example:
pip install "invarlock[hf]"
pip install "invarlock[awq,gptq]"
Plugin install and uninstall commands are not part of the CLI surface.
invarlock advanced runtime-verify
Purpose: package-native runtime provenance verification for an existing evaluation report and its sibling runtime manifest.
Common options:
--report: path toevaluation.report.json--manifest: path toruntime.manifest.json--json: emit a machine-readableruntime-verify-v1envelope
Example:
invarlock advanced runtime-verify \
--report reports/eval/evaluation.report.json \
--manifest reports/eval/runtime.manifest.json
JSON Output
Stable machine-readable output is available on the verification and advanced plugin surfaces.
invarlock verify --jsoninvarlock advanced plugins list --jsoninvarlock advanced evidence-pack verify --jsoninvarlock advanced policy verify --json
These commands emit a single JSON object suitable for CI parsing.
Command Layout
- The public top level is
evaluate,verify,report,doctor,advanced, andversion. - Evidence-pack, policy, plugin, and calibration workflows live under
invarlock advanced .... - Host execution for the core evaluation path is expressed as
--execution-mode host. - Internal delegated config execution uses a package-internal config-runner module, not a public CLI command.
- Optional runtime backends are installed with Python extras instead of CLI install and uninstall commands.