Glossary

TL;DR: This glossary defines key terms used across InvarLock documentation, reports, and code. Terms are grouped by domain (metrics, guards, data, provenance) for quick reference. Each entry includes a definition, context, and cross-references to relevant assurance documents.

Plain language: When reading a report or debugging a pipeline, use this glossary to understand what each field means and where the term originated.


Quick Reference Tables

Primary Metric Terms

TermShort Definitionreport Field
Primary MetricCanonical task metric for gating (ppl or accuracy)primary_metric.*
BCa BootstrapBias-corrected accelerated bootstrap for CIsprimary_metric.ci, primary_metric.reps
Ratio vs BaselineEdited ÷ baseline metric (ppl: lower=worse, acc: higher=better)primary_metric.ratio_vs_baseline
Primary Metric TailTail regression gate (ΔlogNLL at q95)primary_metric_tail.*

Guard Terms

TermShort Definitionreport Field
Canonical Guard Chaininvariants (pre) → spectral → RMT → variance → invariants (post)validation.{invariants_pass,spectral_stable,rmt_stable}, variance.{enabled,predictive_gate.passed}
κ (kappa) ThresholdPer-family spectral cap for z-score outliersspectral.family_caps.*.kappa
ε (epsilon) BandRMT acceptance threshold for edge-riskrmt.epsilon_by_family.*
Guard OverheadPerformance cost of guards vs bare runguard_overhead.*
Measurement ContractEstimator + sampling policy recorded in reportsspectral.measurement_contract_hash

Data Terms

TermShort Definitionreport Field
Window PairingAligning baseline and subject eval windowsdataset.windows.stats.paired_windows
Provider DigestHash of dataset identity (ids/tokenizer/masking)provenance.provider_digest
Tokenizer HashStable hash of tokenizer settingsmeta.tokenizer_hash

Policy Terms

TermShort Definitionreport Field
Tier PolicyGuard threshold preset (conservative/balanced/aggressive)auto.tier
Policy DigestStable hash of resolved policy thresholdspolicy_digest.thresholds_hash

Detailed Definitions

A–B

Baseline

The unedited reference model run used for comparison and gating.

AspectDetails
Contextbaseline report in Compare & evaluate workflow
Related termsSubject Run, Window Pairing, report
report fieldsprovenance.baseline.*, baseline_ref.*
See alsoCompare & evaluate

Example: invarlock evaluate --baseline gpt2 --subject gpt2-quant


BCa Bootstrap

Bias-corrected and accelerated bootstrap method for estimating confidence intervals.

AspectDetails
ContextApplied to paired log-loss deltas for primary metric gating
Related termsPrimary Metric, Window Pairing, Confidence Interval
report fieldsprimary_metric.ci, primary_metric.reps, dataset.windows.stats.bootstrap
See alsoBCa Bootstrap Derivation

Example: BCa bootstrap with 2000 replicates produces ci: [0.995, 1.008] on paired ΔlogNLL, then exponentiated to ratio CI.


C–D

report

Structured evidence artifact summarizing an evaluation run and its validation status.

AspectDetails
ContextGenerated by invarlock evaluate or invarlock report --format report
Related termsReport, Evidence Bundle, Manifest
report fieldsschema_version, run_id, validation.*, artifacts.*
See alsoreports Reference

Example: evaluation.report.json with schema_version: v1 and validation.overall_pass: true


Compare & evaluate (BYOE)

Workflow that compares a subject model to a baseline, optionally with an external edit (Bring Your Own Edit).

AspectDetails
Contextinvarlock evaluate --baseline ... --subject ...
Related termsBaseline, Subject Run, report
report fieldsprovenance.baseline.*, provenance.edited.*
See alsoCompare & evaluate Guide

Example: BYOE workflow evaluates an externally quantized checkpoint against its unmodified baseline.


E–G

Evidence Bundle

Set of files produced for audit: reports, reports, manifests, and events.

AspectDetails
ContextOutput directory from invarlock evaluate or report --format report
Related termsReport, report, Manifest
Typical contentsevaluation.report.json, evaluation_report.md, manifest.json
See alsoArtifact Layout

Canonical Guard Chain

The default guard chain is invariants (pre) → spectralRMTvarianceinvariants (post).

AspectDetails
ContextCore safety checks in run and evaluate flows
Canonical orderinvariants (pre), spectral, rmt, variance, invariants (post)
Related termsGuard Chain, Guard Overhead
See alsoGuards Reference

Enforcement: Guards execute in canonical order for reproducibility; results are recorded in validation.invariants_pass, validation.spectral_stable, validation.rmt_stable.


Guard Chain (Canonical Order)

Fixed execution order for guard preparation and evaluation ensuring deterministic, auditable outcomes.

AspectDetails
ContextDefined by guards.order in config YAML
Related termsGuard Chain (Canonical Order), Guard Overhead
report fieldsNot stored directly (order is config-driven).
See alsoGuards Reference

Guard Overhead

Performance impact of guard checks vs bare control run (no guards).

AspectDetails
ContextMeasured in Release profile; gate requires ≤ +1.0% PM overhead
Related termsCanonical Guard Chain, Timing Summary
report fieldsguard_overhead.{bare_ppl,guarded_ppl,overhead_ratio,overhead_percent}
See alsoGuard Overhead Method

Example: overhead_percent: +0.12% indicates guards add 0.12% to primary metric.


K–M

κ (kappa) Threshold

Per-family spectral cap used to flag abnormally high z-scores.

AspectDetails
Contextspectral.family_caps.*.kappa in tier policy
Typical valuesffn: 3.85, attn: 3.02, embed: 1.05 (Balanced tier)
Related termsSpectral Cap, z-score, Spectral Guard
See alsoSpectral FPR Derivation

Example: kappa=2.8 for attention family means z-scores > 2.8 are flagged.


Measurement Contract

Guard measurement procedure signature and digest recorded in reports.

AspectDetails
ContextSpectral and RMT guards record estimator + sampling policy
Verified byinvarlock verify --profile ci|release
report fieldsspectral.measurement_contract_hash, rmt.measurement_contract_hash
See alsoGuard Contracts

Enforcement: CI/Release profiles require measurement contract match between baseline and subject.


P–R

Policy Digest

Stable hash summarizing resolved policy thresholds for auditability.

AspectDetails
ContextStored in report for policy change detection
Related termsTier Policy, Policy Overrides, Policy Provenance
report fieldspolicy_digest.thresholds_hash, policy_provenance.*
See alsoPolicy Provenance

Primary Metric

The canonical task metric used for gating (perplexity for LMs, accuracy for classification).

AspectDetails
Supported kindsppl_causal, ppl_mlm, accuracy, vqa_accuracy
Gating logicRatio vs baseline must stay within tier thresholds
Related termsPrimary Metric Tail, BCa Bootstrap, Window Pairing
report fieldsprimary_metric.{kind,preview,final,ratio_vs_baseline,ci}
See alsoreports Reference

Example: primary_metric.kind: ppl_causal with ratio_vs_baseline: 1.003


Primary Metric Tail

Optional tail regression gate checking high-loss windows (e.g., q95 ΔlogNLL).

AspectDetails
ContextCatches regression in hard examples even when mean is acceptable
Modewarn (default) or fail
Related termsPrimary Metric, BCa Bootstrap
report fieldsprimary_metric_tail.{evaluated,passed,warned,stats}
See alsoreports Reference

Provider Digest

Dataset identity hash covering token IDs, tokenizer config, and masking strategy.

AspectDetails
ContextEnsures baseline and subject use identical data
Related termsWindow Pairing, Tokenizer Hash
report fieldsprovenance.provider_digest.ids_sha256
See alsoCoverage & Pairing

Report

Run-level artifact with metrics, guard results, and metadata.

AspectDetails
ContextGenerated by invarlock run; input to report generation
Related termsreport, Evidence Bundle
File formatreport.json + events.jsonl
See alsoArtifact Layout

RMT ε (epsilon) Rule

Random Matrix Theory epsilon band used for activation edge-risk stability checks.

AspectDetails
Contextrmt.epsilon_default and rmt.epsilon_by_family.* thresholds
CalibrationDerived from null-sweep runs on target model families
Related termsRMT Guard, κ Threshold
report fieldsrmt.{epsilon_default,epsilon_by_family,stable,max_edge_ratio,max_edge_delta}
See alsoRMT ε Rule

RMT Guard

Guard that checks eigenvalue statistics against Random Matrix Theory bounds.

AspectDetails
FocusActivation edge-risk growth across model families
Validationvalidation.rmt_stable
Related termsCanonical Guard Chain, RMT ε Rule
report fieldsrmt.{families,stable,max_edge_delta}
See alsoGuards Reference

S–T

Spectral Cap

Limit on spectral z-scores per family to flag weight instability.

AspectDetails
ContextApplied by spectral guard; counts violations per family
Related termsκ Threshold, z-score, Spectral Guard
report fieldsspectral.{caps_applied,caps_exceeded,top_z_scores}
See alsoSpectral FPR

Spectral Guard

Guard that monitors spectral norms and z-scores for weight matrices.

AspectDetails
FocusBaseline-relative weight matrix stability
Validationvalidation.spectral_stable
Related termsCanonical Guard Chain, Spectral Cap, κ Threshold
report fieldsspectral.{caps_applied,family_caps,top_z_scores,summary}
See alsoGuards Reference

Subject Run

The edited or target model run under evaluation (compared against baseline).

AspectDetails
Contextsubject checkpoint in Compare & evaluate
Related termsBaseline, report, Window Pairing
report fieldsprovenance.edited.*
See alsoCompare & evaluate

Telemetry

Performance and resource metrics emitted with reports.

AspectDetails
ContextOptional fields for performance analysis
Related termsTiming Summary, Guard Overhead
report fieldstelemetry.*, metrics.memory_mb_peak
See alsoObservability

Tier Policy

Guard threshold preset selecting the safety profile for a run.

AspectDetails
Optionsconservative (strictest), balanced (default), aggressive (loosest)
Sourcesrc/invarlock/_data/runtime/tiers.yaml
Related termsPolicy Digest, Policy Overrides
report fieldsauto.tier, resolved_policy.*
See alsoTier Policy Catalog

Timing Summary

Consolidated timing breakdown for an evaluation run.

AspectDetails
ContextCLI output via print_timing_summary()
IncludesModel load, dataset load, evaluation, report generation
Related termsGuard Overhead, Telemetry
See alsoObservability

Tokenizer Hash

Stable hash of tokenizer settings and vocabulary for reproducibility.

AspectDetails
ContextEnsures baseline and subject use identical tokenization
Related termsProvider Digest, Window Pairing
report fieldsdata.tokenizer_hash, meta.tokenizer_hash
See alsoDeterminism Contracts

V–Z

Variance Effect (VE)

Guard that tracks variance change and applies equalization when beneficial.

AspectDetails
ContextA/B test compares bare vs VE-enabled evaluation
Enabling conditionCI excludes 0 AND mean Δ ≤ -min_effect_lognll
Related termsCanonical Guard Chain, Guard Overhead, Predictive Gate
report fieldsvariance.{enabled,gain,predictive_gate.delta_ci,predictive_gate.passed}
See alsoVE Gate Power

Window Pairing

Alignment of baseline and subject evaluation windows for paired statistical testing.

AspectDetails
RequirementsSame window IDs, zero overlap, 100% match fraction
ViolationE001 pairing error in CI/Release profiles
Related termsBCa Bootstrap, Primary Metric, Provider Digest
report fieldsdataset.windows.stats.{paired_windows,window_match_fraction,window_overlap_fraction}
See alsoCoverage & Pairing

Example: paired_windows: 200, window_match_fraction: 1.0, window_overlap_fraction: 0.0


z-score

Standardized deviation used in spectral guard scoring.

AspectDetails
Formulaz = (σ_edited - μ_baseline) / std_baseline
ThresholdingCompared against family-specific κ caps
Related termsSpectral Cap, κ Threshold
report fieldsspectral.top_z_scores, spectral.family_caps.*.kappa
See alsoSpectral FPR

Example: max |z| = 2.1 indicates the largest z-score across all weight matrices.


See Also