Spectral Guard False Positive Rate (FPR)
Plain language: The spectral guard keeps each layer’s singular values close to the baseline so the per-run false positive rate (FPR) stays within the calibrated WARN budget.
Claim
The spectral guard enforces the calibrated WARN budget by monitoring per-family z-scores and limiting the per-run FPR under the chosen null-modeling assumptions.
Derivation (sketch)
Per-family spectral monitoring computes z-scores
for a spectral statistic (e.g., top singular value). A WARN is issued if . Under a modeled null where , the per-module two-sided tail probability becomes
Applying Bonferroni across the modules controls the family-wise error rate (FWER); applying Benjamini–Hochberg (BH) controls the expected false-discovery proportion (FDR). Balanced tiers choose BH (α=0.05, m=4); Conservative tiers choose Bonferroni (α=0.000625, m=4). Document the policy alongside so auditors can recover the expected per-run WARN rate.
Assumptions & Scope
- Baseline runs provide per family ; when we fall back to the tier deadband δ.
- Only 2‑D weight matrices (FFN blocks, attention projections, embeddings) are evaluated; 1‑D LayerNorm parameters are explicitly excluded from spectral monitoring. LayerNorm coverage is provided by invariants (presence checks) and activation‑based RMT (CI/Release); VE captures any aggregate performance shift.
- Balanced tier uses the Benjamini–Hochberg procedure (
method = "bh", α = 0.05, m = 4 families) with per-family caps{ffn: 3.849, attn: 3.018, embed: 1.05, other: 0.0},sigma_quantile = 0.95, andmax_caps = 5, yielding ≤5% WARN rate on null runs (stored intiers.yaml). Scope isall, so FFN, attention, embeddings, and other 2‑D weights are all monitored. - Conservative tier applies Bonferroni (
method = "bonferroni", α = 0.000625, m = 4) with caps{ffn: 3.849, attn: 2.6, embed: 2.8, other: 2.8},sigma_quantile = 0.90, andmax_caps = 3, keeping WARNs within the calibrated budget. Scope isffnin the shipped tier policies, so only FFN blocks are actively budgeted under the Conservative spectral guard. - Deadband δ suppresses flicker around the cap: Balanced records δ = 0.10,
Conservative δ = 0.05, surfaced in reports via
spectral.summary.deadband. - reports expose the calibrations under
spectral.multiple_testing.{method,alpha,m},spectral.summary.{sigma_quantile,max_caps,deadband}, andspectral.family_caps[*].kappa. - The FPR story is a calibration assumption under the chosen null model, not a theorem about arbitrary transformer weights.
- Empirical histograms of should be approximately standard normal; heavy tails → raise or use robust (MAD-scaled).
The deadband δ is a guardrail against flicker: relative changes within ±δ are treated as neutral, so WARNs only fire when sustained growth exceeds both δ and the family κ cap. Auditors can confirm the chosen δ directly in the report summary.
Runtime Contract (report)
- report exposes
spectral.summary.{sigma_quantile,deadband,modules_checked,max_caps,caps_exceeded},spectral.family_caps, andspectral.families[family]with{max, mean, count, violations, kappa}.sigma_quantileis the calibrated baseline percentile used to derive the reference target. - Tier files document FPR targets and mapping expected WARNs.
- Policy metadata records the multiple-testing method
(
spectral.multiple_testing) and the cap limit (spectral.max_caps).
Observability
spectral.summary.{sigma_quantile,deadband,modules_checked,max_caps,caps_exceeded}spectral.family_caps[*].kappaandspectral.families[*].{kappa,violations}spectral.multiple_testing.{method,alpha,m}andspectral.max_caps
Worked example (Balanced tier)
- For FFN modules,
family_caps.ffn.kappa = 3.849. Suppose a layer reports . - report records a WARN in
spectral.families.ffn.violations += 1;spectral.caps_appliedincrements. - Balanced
max_caps = 5. After the fifth WARN the guard continues to WARN; the sixth triggersspectral.caps_exceeded=trueand the run aborts. - Multiple-testing metadata shows
spectral.multiple_testing = {method: "bh", alpha: 0.05, m: 4}so reviewers can verify the tier-wide correction.
Calibration
Calibration values are derived from null-sweep runs using the order-statistic
and parametric methods described in the tier calibration documentation
(09-tier-v1-calibration.md). The calibrated κ
values are stored in the packaged tiers.yaml
(invarlock._data.runtime/tiers.yaml).
To recalibrate, run null baselines (no edit) and collect per-module z-scores. Allocate the WARN budget across families proportionally by module count, then set κ(f) via order-statistic (the B(f)-th largest |z| in that family) or parametric inversion of the tail probability. Add a small safety margin (η ≈ 0.05–0.10) and validate that subsequent null runs stay within the budget.
Basis column in Quality Gates tables: "point" = point estimate gate, "upper" = upper-bound gate, "point & upper" = both point and upper bounds must pass.
References
- Benjamini, Y., & Hochberg, Y. (1995). “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x