InvarLock
Catch hidden regressions before deployment
Use the open-source InvarLock engine now to catch regressions from quantization, pruning, fine-tuning, or merges before release. If your team already reviews model changes before promotion and needs a private review workflow around the same evidence model, we are working with a small set of design partners on that path.
Wheel-first install
Start with the published wheel for verify and report flows. Add invarlock[hf] when evaluate needs to load Hugging Face models.
$pip install invarlockEvidence first
Real report structure, verification fields, and reviewer-facing artifacts.
{
"schema_version": "v1",
"summary": {
"status": "PASS",
"gates_passed": 12,
"gates_total": 12
},
"primary_metric": {
"name": "perplexity",
"ratio_vs_baseline": 0.982,
"ci95": [
0.97,
0.99
]
},
"validation": {
"primary_metric_acceptable": true,
"preview_final_drift_acceptable": true
},
"artifacts": {
"evaluation_report_json": "reports/eval/evaluation.report.json",
"evaluation_html": "reports/eval/evaluation.html",
"verify_json": "reports/eval/verify.json"
}
}Verify
invarlock verify reports/eval/evaluation.report.json
Decision trail
Pair baseline and subject windows, run guards, then re-check the artifact.
Fit today
OSS now for execution, scoped design-partner pilots for the broader on-prem review workflow.
OSS engine
Available now
Design-partner pilots
Scoped conversations open
Commercial packaging
Still being defined
Proof
Verify the artifact, not just the claim
Verification re-checks schema integrity, paired-window math, measurement contracts, and runtime-manifest attestation before you promote an edited checkpoint.
Verification command
invarlock verify reports/eval/evaluation.report.json
Checks:
- schema compliance
- paired-window math
- measurement contracts
- runtime-manifest attestationTargeted regression detection
Detects targeted regressions under pinned evaluation conditions.
Reproducible evidence
Produces reproducible, machine-readable evidence for review and CI gating.
Explicit boundary
Does not prove global model correctness or the absence of all failures.
Workflow
Run, verify, and ship with hard evidence
A deterministic path from checkpoint edit to evaluation report, verification, and release review.
Provide your checkpoints
Point InvarLock at your baseline (original) model and subject (edited) checkpoint. Supports HuggingFace models, local files, and more.
INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
--baseline gpt2 \
--subject /path/to/edited \
--adapter auto \
--profile ci \
--report-out reports/evalRun the evaluation pipeline
InvarLock runs paired evaluation windows through the canonical guard chain and writes an attested evaluation bundle when the checks clear.
Status: PASS
Gates: 12/12 passed
Primary metric ratio: 0.98
Output: reports/eval/evaluation.report.json
Attestation: reports/eval/runtime.manifest.jsonGate review with evidence
Use verification exit codes for CI gating, then render the same evaluation report into a reviewer-facing HTML artifact when you need a handoff bundle.
invarlock verify --json reports/eval/evaluation.report.json
invarlock report html -i reports/eval/evaluation.report.json \
-o reports/eval/evaluation.htmlWhy this is usable in review
Paired windows
Deterministic baseline-vs-subject comparisons stay reproducible across runs.
Guard pipeline
Invariants, spectral, RMT, and variance checks stay in the decision path.
Verification output
Release review gets a report plus an explicit verify step instead of screenshots.
Use cases
Three moments where verifiable evidence matters most
Use the same evaluation report and verification flow when checkpoint edits are fresh, release review is strict, or artifacts must travel beyond the team that created them.
After quantization, pruning, or LoRA
Catch regressions while the change is still reversible
Run paired evaluation immediately after weight edits so silent regressions show up before the checkpoint reaches a release branch or downstream benchmark.
- Deterministic baseline vs subject windows
- Reviewable evaluation report for every edit pass
- Explicit regression boundaries instead of intuition
Before release sign-off
Give reviewers an inspectable artifact trail
Use verification and evaluation reports in CI so release review is based on pinned evidence, not screenshots, dashboards, or informal notebook output.
- Verification step for schema, pairing, and gates
- CI-friendly exit codes for fail-closed promotion
- Shareable evidence for approvers and auditors
When evidence must travel
Bundle reproducibility into evidence packs
Package evaluation outputs, verification artifacts, and manifests into evidence packs when evidence has to move across teams, environments, or hardware topologies.
- Portable artifact trail for external review
- Proof-pack workflow for reproducible handoff
- Clear separation between guarantees and non-guarantees
Next step
Use the OSS engine now or discuss a scoped design-partner pilot
The OSS engine, docs, GitHub repo, and evidence examples are available now. If you already run a real model-change review process and want to shape the private on-prem workflow around it, start with a direct design-partner conversation.
Prefer the OSS path? Read OSS docs · See real artifacts · Review pilot details