InvarLock

Catch hidden regressions before deployment

Use the open-source InvarLock engine now to catch regressions from quantization, pruning, fine-tuning, or merges before release. If your team already reviews model changes before promotion and needs a private review workflow around the same evidence model, we are working with a small set of design partners on that path.

Wheel-first install

Start with the published wheel for verify and report flows. Add invarlock[hf] when evaluate needs to load Hugging Face models.

$pip install invarlock

Real report structure, verification fields, and reviewer-facing artifacts.

Inspect evidence
Sample evaluation.report.json
{
  "schema_version": "v1",
  "summary": {
    "status": "PASS",
    "gates_passed": 12,
    "gates_total": 12
  },
  "primary_metric": {
    "name": "perplexity",
    "ratio_vs_baseline": 0.982,
    "ci95": [
      0.97,
      0.99
    ]
  },
  "validation": {
    "primary_metric_acceptable": true,
    "preview_final_drift_acceptable": true
  },
  "artifacts": {
    "evaluation_report_json": "reports/eval/evaluation.report.json",
    "evaluation_html": "reports/eval/evaluation.html",
    "verify_json": "reports/eval/verify.json"
  }
}

Verify

invarlock verify reports/eval/evaluation.report.json

Decision trail

Pair baseline and subject windows, run guards, then re-check the artifact.

Fit today

OSS now for execution, scoped design-partner pilots for the broader on-prem review workflow.

OSS engine

Available now

Design-partner pilots

Scoped conversations open

Commercial packaging

Still being defined

Verify the artifact, not just the claim

Verification re-checks schema integrity, paired-window math, measurement contracts, and runtime-manifest attestation before you promote an edited checkpoint.

Verification command

invarlock verify reports/eval/evaluation.report.json

Checks:
  - schema compliance
  - paired-window math
  - measurement contracts
  - runtime-manifest attestation

Targeted regression detection

Detects targeted regressions under pinned evaluation conditions.

Reproducible evidence

Produces reproducible, machine-readable evidence for review and CI gating.

Explicit boundary

Does not prove global model correctness or the absence of all failures.

Run, verify, and ship with hard evidence

A deterministic path from checkpoint edit to evaluation report, verification, and release review.

Step 1

Provide your checkpoints

Point InvarLock at your baseline (original) model and subject (edited) checkpoint. Supports HuggingFace models, local files, and more.

INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject /path/to/edited \
  --adapter auto \
  --profile ci \
  --report-out reports/eval
Step 2

Run the evaluation pipeline

InvarLock runs paired evaluation windows through the canonical guard chain and writes an attested evaluation bundle when the checks clear.

Status: PASS
Gates: 12/12 passed
Primary metric ratio: 0.98
Output: reports/eval/evaluation.report.json
Attestation: reports/eval/runtime.manifest.json
Step 3

Gate review with evidence

Use verification exit codes for CI gating, then render the same evaluation report into a reviewer-facing HTML artifact when you need a handoff bundle.

invarlock verify --json reports/eval/evaluation.report.json
invarlock report html -i reports/eval/evaluation.report.json \
  -o reports/eval/evaluation.html
invarlock demo
$

Paired windows

Deterministic baseline-vs-subject comparisons stay reproducible across runs.

Guard pipeline

Invariants, spectral, RMT, and variance checks stay in the decision path.

Verification output

Release review gets a report plus an explicit verify step instead of screenshots.

Three moments where verifiable evidence matters most

Use the same evaluation report and verification flow when checkpoint edits are fresh, release review is strict, or artifacts must travel beyond the team that created them.

After quantization, pruning, or LoRA

Catch regressions while the change is still reversible

Run paired evaluation immediately after weight edits so silent regressions show up before the checkpoint reaches a release branch or downstream benchmark.

  • Deterministic baseline vs subject windows
  • Reviewable evaluation report for every edit pass
  • Explicit regression boundaries instead of intuition

Before release sign-off

Give reviewers an inspectable artifact trail

Use verification and evaluation reports in CI so release review is based on pinned evidence, not screenshots, dashboards, or informal notebook output.

  • Verification step for schema, pairing, and gates
  • CI-friendly exit codes for fail-closed promotion
  • Shareable evidence for approvers and auditors

When evidence must travel

Bundle reproducibility into evidence packs

Package evaluation outputs, verification artifacts, and manifests into evidence packs when evidence has to move across teams, environments, or hardware topologies.

  • Portable artifact trail for external review
  • Proof-pack workflow for reproducible handoff
  • Clear separation between guarantees and non-guarantees

Use the OSS engine now or discuss a scoped design-partner pilot

The OSS engine, docs, GitHub repo, and evidence examples are available now. If you already run a real model-change review process and want to shape the private on-prem workflow around it, start with a direct design-partner conversation.