Back to blog

Announcement · Getting Started

Welcome to InvarLock

Ink/charcoal doodle: a baseline comparison flows into a proof-backed report.

A quick introduction to InvarLock: evaluate LLM weight edits with statistical guarantees and auditable proof packs.

2 min read
InvarLock Team

Post: What InvarLock is, what it checks, and how to try it.

Highlights

  • Evaluate edited weights against a baseline with paired metrics and confidence intervals.
  • GuardChain checks for “unsafe to compare” measurement mismatches and quality drift.
  • Proof packs capture the artifacts you need to verify and share results.

If you edit model weights (quantization, pruning, fine-tuning, merges), you eventually hit the same question: did this change silently break anything that matters? “It loads” isn’t enough, and single-number metrics often miss the failure modes you’ll regret later.

InvarLock is designed for that moment. It produces an evaluation report that is both human-readable and machine-verifiable, so you can make upgrade decisions with evidence—not vibes.

Quickstart

Install InvarLock via pip:

pip install "invarlock[hf]"

Run your first evaluation:

INVARLOCK_ALLOW_NETWORK=1 invarlock evaluate \\
  --baseline gpt2 \\
  --subject gpt2 \\
  --adapter auto \\
  --profile dev

That produces an evaluation report and (optionally) a proof pack you can archive, verify, and share.

What’s next

  • Evaluation-as-a-Service (hosted runs and reviewable outputs)
  • Broader adapter and framework coverage
  • Better “what changed?” analytics over time

To go deeper, start with the docs. For questions and feedback, email [email protected].

Want updates as we ship? Join the waitlist.

More from the blog

Continue through recent releases and implementation notes.