Glossary
TL;DR: This glossary defines key terms used across InvarLock documentation, reports, and code. Terms are grouped by domain (metrics, guards, data, provenance) for quick reference. Each entry includes a definition, context, and cross-references to relevant assurance documents.
Plain language: When reading a report or debugging a pipeline, use this glossary to understand what each field means and where the term originated.
Quick Reference Tables
Primary Metric Terms
| Term | Short Definition | report Field |
|---|---|---|
| Primary Metric | Canonical task metric for gating (ppl or accuracy) | primary_metric.* |
| BCa Bootstrap | Bias-corrected accelerated bootstrap for CIs | primary_metric.ci, primary_metric.reps |
| Ratio vs Baseline | Edited ÷ baseline metric (ppl: lower=worse, acc: higher=better) | primary_metric.ratio_vs_baseline |
| Primary Metric Tail | Tail regression gate (ΔlogNLL at q95) | primary_metric_tail.* |
Guard Terms
| Term | Short Definition | report Field |
|---|---|---|
| Canonical Guard Chain | invariants (pre) → spectral → RMT → variance → invariants (post) | validation.{invariants_pass,spectral_stable,rmt_stable}, variance.{enabled,predictive_gate.passed} |
| κ (kappa) Threshold | Per-family spectral cap for z-score outliers | spectral.family_caps.*.kappa |
| ε (epsilon) Band | RMT acceptance threshold for edge-risk | rmt.epsilon_by_family.* |
| Guard Overhead | Performance cost of guards vs bare run | guard_overhead.* |
| Measurement Contract | Estimator + sampling policy recorded in reports | spectral.measurement_contract_hash |
Data Terms
| Term | Short Definition | report Field |
|---|---|---|
| Window Pairing | Aligning baseline and subject eval windows | dataset.windows.stats.paired_windows |
| Provider Digest | Hash of dataset identity (ids/tokenizer/masking) | provenance.provider_digest |
| Tokenizer Hash | Stable hash of tokenizer settings | meta.tokenizer_hash |
Policy Terms
| Term | Short Definition | report Field |
|---|---|---|
| Tier Policy | Guard threshold preset (conservative/balanced/aggressive) | auto.tier |
| Policy Digest | Stable hash of resolved policy thresholds | policy_digest.thresholds_hash |
Detailed Definitions
A–B
Baseline
The unedited reference model run used for comparison and gating.
| Aspect | Details |
|---|---|
| Context | baseline report in Compare & evaluate workflow |
| Related terms | Subject Run, Window Pairing, report |
| report fields | provenance.baseline.*, baseline_ref.* |
| See also | Compare & evaluate |
Example: invarlock evaluate --baseline gpt2 --subject gpt2-quant
BCa Bootstrap
Bias-corrected and accelerated bootstrap method for estimating confidence intervals.
| Aspect | Details |
|---|---|
| Context | Applied to paired log-loss deltas for primary metric gating |
| Related terms | Primary Metric, Window Pairing, Confidence Interval |
| report fields | primary_metric.ci, primary_metric.reps, dataset.windows.stats.bootstrap |
| See also | BCa Bootstrap Derivation |
Example: BCa bootstrap with 2000 replicates produces ci: [0.995, 1.008] on
paired ΔlogNLL, then exponentiated to ratio CI.
C–D
report
Structured evidence artifact summarizing an evaluation run and its validation status.
| Aspect | Details |
|---|---|
| Context | Generated by invarlock evaluate or invarlock report --format report |
| Related terms | Report, Evidence Bundle, Manifest |
| report fields | schema_version, run_id, validation.*, artifacts.* |
| See also | reports Reference |
Example: evaluation.report.json with schema_version: v1 and validation.overall_pass: true
Compare & evaluate (BYOE)
Workflow that compares a subject model to a baseline, optionally with an external edit (Bring Your Own Edit).
| Aspect | Details |
|---|---|
| Context | invarlock evaluate --baseline ... --subject ... |
| Related terms | Baseline, Subject Run, report |
| report fields | provenance.baseline.*, provenance.edited.* |
| See also | Compare & evaluate Guide |
Example: BYOE workflow evaluates an externally quantized checkpoint against its unmodified baseline.
E–G
Evidence Bundle
Set of files produced for audit: reports, reports, manifests, and events.
| Aspect | Details |
|---|---|
| Context | Output directory from invarlock evaluate or report --format report |
| Related terms | Report, report, Manifest |
| Typical contents | evaluation.report.json, evaluation_report.md, manifest.json |
| See also | Artifact Layout |
Canonical Guard Chain
The default guard chain is invariants (pre) → spectral → RMT → variance
→ invariants (post).
| Aspect | Details |
|---|---|
| Context | Core safety checks in run and evaluate flows |
| Canonical order | invariants (pre), spectral, rmt, variance, invariants (post) |
| Related terms | Guard Chain, Guard Overhead |
| See also | Guards Reference |
Enforcement: Guards execute in canonical order for reproducibility; results are
recorded in validation.invariants_pass, validation.spectral_stable,
validation.rmt_stable.
Guard Chain (Canonical Order)
Fixed execution order for guard preparation and evaluation ensuring deterministic, auditable outcomes.
| Aspect | Details |
|---|---|
| Context | Defined by guards.order in config YAML |
| Related terms | Guard Chain (Canonical Order), Guard Overhead |
| report fields | Not stored directly (order is config-driven). |
| See also | Guards Reference |
Guard Overhead
Performance impact of guard checks vs bare control run (no guards).
| Aspect | Details |
|---|---|
| Context | Measured in Release profile; gate requires ≤ +1.0% PM overhead |
| Related terms | Canonical Guard Chain, Timing Summary |
| report fields | guard_overhead.{bare_ppl,guarded_ppl,overhead_ratio,overhead_percent} |
| See also | Guard Overhead Method |
Example: overhead_percent: +0.12% indicates guards add 0.12% to primary metric.
K–M
κ (kappa) Threshold
Per-family spectral cap used to flag abnormally high z-scores.
| Aspect | Details |
|---|---|
| Context | spectral.family_caps.*.kappa in tier policy |
| Typical values | ffn: 3.85, attn: 3.02, embed: 1.05 (Balanced tier) |
| Related terms | Spectral Cap, z-score, Spectral Guard |
| See also | Spectral FPR Derivation |
Example: kappa=2.8 for attention family means z-scores > 2.8 are flagged.
Measurement Contract
Guard measurement procedure signature and digest recorded in reports.
| Aspect | Details |
|---|---|
| Context | Spectral and RMT guards record estimator + sampling policy |
| Verified by | invarlock verify --profile ci|release |
| report fields | spectral.measurement_contract_hash, rmt.measurement_contract_hash |
| See also | Guard Contracts |
Enforcement: CI/Release profiles require measurement contract match between baseline and subject.
P–R
Policy Digest
Stable hash summarizing resolved policy thresholds for auditability.
| Aspect | Details |
|---|---|
| Context | Stored in report for policy change detection |
| Related terms | Tier Policy, Policy Overrides, Policy Provenance |
| report fields | policy_digest.thresholds_hash, policy_provenance.* |
| See also | Policy Provenance |
Primary Metric
The canonical task metric used for gating (perplexity for LMs, accuracy for classification).
| Aspect | Details |
|---|---|
| Supported kinds | ppl_causal, ppl_mlm, accuracy, vqa_accuracy |
| Gating logic | Ratio vs baseline must stay within tier thresholds |
| Related terms | Primary Metric Tail, BCa Bootstrap, Window Pairing |
| report fields | primary_metric.{kind,preview,final,ratio_vs_baseline,ci} |
| See also | reports Reference |
Example: primary_metric.kind: ppl_causal with ratio_vs_baseline: 1.003
Primary Metric Tail
Optional tail regression gate checking high-loss windows (e.g., q95 ΔlogNLL).
| Aspect | Details |
|---|---|
| Context | Catches regression in hard examples even when mean is acceptable |
| Mode | warn (default) or fail |
| Related terms | Primary Metric, BCa Bootstrap |
| report fields | primary_metric_tail.{evaluated,passed,warned,stats} |
| See also | reports Reference |
Provider Digest
Dataset identity hash covering token IDs, tokenizer config, and masking strategy.
| Aspect | Details |
|---|---|
| Context | Ensures baseline and subject use identical data |
| Related terms | Window Pairing, Tokenizer Hash |
| report fields | provenance.provider_digest.ids_sha256 |
| See also | Coverage & Pairing |
Report
Run-level artifact with metrics, guard results, and metadata.
| Aspect | Details |
|---|---|
| Context | Generated by invarlock run; input to report generation |
| Related terms | report, Evidence Bundle |
| File format | report.json + events.jsonl |
| See also | Artifact Layout |
RMT ε (epsilon) Rule
Random Matrix Theory epsilon band used for activation edge-risk stability checks.
| Aspect | Details |
|---|---|
| Context | rmt.epsilon_default and rmt.epsilon_by_family.* thresholds |
| Calibration | Derived from null-sweep runs on target model families |
| Related terms | RMT Guard, κ Threshold |
| report fields | rmt.{epsilon_default,epsilon_by_family,stable,max_edge_ratio,max_edge_delta} |
| See also | RMT ε Rule |
RMT Guard
Guard that checks eigenvalue statistics against Random Matrix Theory bounds.
| Aspect | Details |
|---|---|
| Focus | Activation edge-risk growth across model families |
| Validation | validation.rmt_stable |
| Related terms | Canonical Guard Chain, RMT ε Rule |
| report fields | rmt.{families,stable,max_edge_delta} |
| See also | Guards Reference |
S–T
Spectral Cap
Limit on spectral z-scores per family to flag weight instability.
| Aspect | Details |
|---|---|
| Context | Applied by spectral guard; counts violations per family |
| Related terms | κ Threshold, z-score, Spectral Guard |
| report fields | spectral.{caps_applied,caps_exceeded,top_z_scores} |
| See also | Spectral FPR |
Spectral Guard
Guard that monitors spectral norms and z-scores for weight matrices.
| Aspect | Details |
|---|---|
| Focus | Baseline-relative weight matrix stability |
| Validation | validation.spectral_stable |
| Related terms | Canonical Guard Chain, Spectral Cap, κ Threshold |
| report fields | spectral.{caps_applied,family_caps,top_z_scores,summary} |
| See also | Guards Reference |
Subject Run
The edited or target model run under evaluation (compared against baseline).
| Aspect | Details |
|---|---|
| Context | subject checkpoint in Compare & evaluate |
| Related terms | Baseline, report, Window Pairing |
| report fields | provenance.edited.* |
| See also | Compare & evaluate |
Telemetry
Performance and resource metrics emitted with reports.
| Aspect | Details |
|---|---|
| Context | Optional fields for performance analysis |
| Related terms | Timing Summary, Guard Overhead |
| report fields | telemetry.*, metrics.memory_mb_peak |
| See also | Observability |
Tier Policy
Guard threshold preset selecting the safety profile for a run.
| Aspect | Details |
|---|---|
| Options | conservative (strictest), balanced (default), aggressive (loosest) |
| Source | src/invarlock/_data/runtime/tiers.yaml |
| Related terms | Policy Digest, Policy Overrides |
| report fields | auto.tier, resolved_policy.* |
| See also | Tier Policy Catalog |
Timing Summary
Consolidated timing breakdown for an evaluation run.
| Aspect | Details |
|---|---|
| Context | CLI output via print_timing_summary() |
| Includes | Model load, dataset load, evaluation, report generation |
| Related terms | Guard Overhead, Telemetry |
| See also | Observability |
Tokenizer Hash
Stable hash of tokenizer settings and vocabulary for reproducibility.
| Aspect | Details |
|---|---|
| Context | Ensures baseline and subject use identical tokenization |
| Related terms | Provider Digest, Window Pairing |
| report fields | data.tokenizer_hash, meta.tokenizer_hash |
| See also | Determinism Contracts |
V–Z
Variance Effect (VE)
Guard that tracks variance change and applies equalization when beneficial.
| Aspect | Details |
|---|---|
| Context | A/B test compares bare vs VE-enabled evaluation |
| Enabling condition | CI excludes 0 AND mean Δ ≤ -min_effect_lognll |
| Related terms | Canonical Guard Chain, Guard Overhead, Predictive Gate |
| report fields | variance.{enabled,gain,predictive_gate.delta_ci,predictive_gate.passed} |
| See also | VE Gate Power |
Window Pairing
Alignment of baseline and subject evaluation windows for paired statistical testing.
| Aspect | Details |
|---|---|
| Requirements | Same window IDs, zero overlap, 100% match fraction |
| Violation | E001 pairing error in CI/Release profiles |
| Related terms | BCa Bootstrap, Primary Metric, Provider Digest |
| report fields | dataset.windows.stats.{paired_windows,window_match_fraction,window_overlap_fraction} |
| See also | Coverage & Pairing |
Example: paired_windows: 200, window_match_fraction: 1.0,
window_overlap_fraction: 0.0
z-score
Standardized deviation used in spectral guard scoring.
| Aspect | Details |
|---|---|
| Formula | z = (σ_edited - μ_baseline) / std_baseline |
| Thresholding | Compared against family-specific κ caps |
| Related terms | Spectral Cap, κ Threshold |
| report fields | spectral.top_z_scores, spectral.family_caps.*.kappa |
| See also | Spectral FPR |
Example: max |z| = 2.1 indicates the largest z-score across all weight matrices.
See Also
- Assurance Case Overview — Enumerated claims and evidence
- Guard Contracts — Guard behavior specifications
- CLI Reference — Command-line usage
- Guards Reference — Guard configuration
- reports Reference — report schema