InvarLock

Catch hidden regressions before deployment

Use the open-source InvarLock engine now to catch regressions from quantization, pruning, fine-tuning, or merges before release. If your team already reviews model changes before promotion and needs a private review workflow around the same evidence model, we are working with a small set of design partners on that path.

Quickstart Inspect evidence

See design-partner pilot details

Wheel-first install

Start with the published wheel for verify and report flows. Add invarlock[hf] when evaluate needs to load Hugging Face models.

$pip install invarlock

Evidence first

Real report structure, verification fields, and reviewer-facing artifacts.

Inspect evidence

Sample evaluation.report.json

{
  "schema_version": "v1",
  "summary": {
    "status": "PASS",
    "gates_passed": 12,
    "gates_total": 12
  },
  "primary_metric": {
    "name": "perplexity",
    "ratio_vs_baseline": 0.982,
    "ci95": [
      0.97,
      0.99
    ]
  },
  "validation": {
    "primary_metric_acceptable": true,
    "preview_final_drift_acceptable": true
  },
  "artifacts": {
    "evaluation_report_json": "reports/eval/evaluation.report.json",
    "evaluation_html": "reports/eval/evaluation.html",
    "verify_json": "reports/eval/verify.json"
  }
}

Verify

invarlock verify reports/eval/evaluation.report.json

Decision trail

Pair baseline and subject windows, run guards, then re-check the artifact.

Fit today

OSS now for execution, scoped design-partner pilots for the broader on-prem review workflow.

OSS engine

Available now

Design-partner pilots

Scoped conversations open

Commercial packaging

Still being defined

Proof

Verify the artifact, not just the claim

Verification re-checks schema integrity, paired-window math, measurement contracts, and runtime-manifest attestation before you promote an edited checkpoint.

Verification command

invarlock verify reports/eval/evaluation.report.json

Checks:
  - schema compliance
  - paired-window math
  - measurement contracts
  - runtime-manifest attestation

Targeted regression detection

Detects targeted regressions under pinned evaluation conditions.

Reproducible evidence

Produces reproducible, machine-readable evidence for review and CI gating.

Explicit boundary

Does not prove global model correctness or the absence of all failures.

Example Reports

Inspect representative evaluation outputs and reviewer-facing artifacts.

Reading a report

See how to interpret pass/fail status, paired metrics, and provenance.

Evidence Packs

Follow the portable artifact flow for reproducibility and verification handoff.

Workflow

Run, verify, and ship with hard evidence

A deterministic path from checkpoint edit to evaluation report, verification, and release review.

Browse OSS docs

Step 1

Provide your checkpoints

Point InvarLock at your baseline (original) model and subject (edited) checkpoint. Supports HuggingFace models, local files, and more.

INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject /path/to/edited \
  --adapter auto \
  --profile ci \
  --report-out reports/eval

Step 2

Run the evaluation pipeline

InvarLock runs paired evaluation windows through the canonical guard chain and writes an attested evaluation bundle when the checks clear.

Status: PASS
Gates: 12/12 passed
Primary metric ratio: 0.98
Output: reports/eval/evaluation.report.json
Attestation: reports/eval/runtime.manifest.json

Step 3

Gate review with evidence

Use verification exit codes for CI gating, then render the same evaluation report into a reviewer-facing HTML artifact when you need a handoff bundle.

invarlock verify --json reports/eval/evaluation.report.json
invarlock report html -i reports/eval/evaluation.report.json \
  -o reports/eval/evaluation.html

invarlock demo

Why this is usable in review

Paired windows

Deterministic baseline-vs-subject comparisons stay reproducible across runs.

Guard pipeline

Invariants, spectral, RMT, and variance checks stay in the decision path.

Verification output

Release review gets a report plus an explicit verify step instead of screenshots.

Use cases

Three moments where verifiable evidence matters most

Use the same evaluation report and verification flow when checkpoint edits are fresh, release review is strict, or artifacts must travel beyond the team that created them.

After quantization, pruning, or LoRA

Catch regressions while the change is still reversible

Run paired evaluation immediately after weight edits so silent regressions show up before the checkpoint reaches a release branch or downstream benchmark.

Deterministic baseline vs subject windows
Reviewable evaluation report for every edit pass
Explicit regression boundaries instead of intuition

Before release sign-off

Give reviewers an inspectable artifact trail

Use verification and evaluation reports in CI so release review is based on pinned evidence, not screenshots, dashboards, or informal notebook output.

Verification step for schema, pairing, and gates
CI-friendly exit codes for fail-closed promotion
Shareable evidence for approvers and auditors

When evidence must travel

Bundle reproducibility into evidence packs

Package evaluation outputs, verification artifacts, and manifests into evidence packs when evidence has to move across teams, environments, or hardware topologies.

Portable artifact trail for external review
Proof-pack workflow for reproducible handoff
Clear separation between guarantees and non-guarantees

Next step

Use the OSS engine now or discuss a scoped design-partner pilot

The OSS engine, docs, GitHub repo, and evidence examples are available now. If you already run a real model-change review process and want to shape the private on-prem workflow around it, start with a direct design-partner conversation.

Request a design-partner diagnostic View on GitHub

Prefer the OSS path? Read OSS docs · See real artifacts · Review pilot details