Spectral Guard False Positive Rate (FPR)
Plain language: The spectral guard records a calibrated multiple-testing policy for per-family singular-value drift. Gaussian-tail FPR math applies to the families whose kappas were calibrated for that model; low Balanced
embed/othercaps are operational sentinels, not standalone <=5% FPR guarantees.
Claim
The spectral guard monitors per-family z-scores and records the multiple-testing
policy needed to interpret WARNs under the chosen null-modeling assumptions.
For families whose kappas are calibrated against an approximately Gaussian null,
the two-sided tail probability gives the expected false-positive rate. Families
with intentionally low sentinel caps are still monitored and budgeted by
max_caps, but they must not be cited as <=5% Gaussian-tail guarantees.
Derivation (sketch)
Per-family spectral monitoring computes z-scores
for a spectral statistic (e.g., top singular value). A WARN is issued if . Under a modeled null where , the per-module two-sided tail probability becomes
Applying Bonferroni across the modules controls the family-wise error rate (FWER); applying Benjamini–Hochberg (BH) controls the expected false-discovery proportion (FDR). Balanced tiers choose BH (α=0.05, m=4); Conservative tiers choose Bonferroni (α=0.000625, m=4). Document the policy alongside so auditors can recover the expected per-run WARN rate.
Assumptions & Scope
- Baseline runs provide per family ; when we fall back to the tier deadband δ.
- Only 2‑D weight matrices (FFN blocks, attention projections, embeddings) are evaluated; 1‑D LayerNorm parameters are explicitly excluded from spectral monitoring. LayerNorm coverage is provided by invariants (presence checks) and activation‑based RMT (CI/Release); VE captures any aggregate performance shift.
- Balanced tier stores Benjamini-Hochberg metadata (
method = "bh", alpha = 0.05, m = 4 families) with per-family caps{ffn: 3.849, attn: 3.018, embed: 1.05, other: 0.0},sigma_quantile = 0.95, andmax_caps = 5. Scope isall, so FFN, attention, embeddings, and other 2-D weights are all monitored. The Gaussian-tail FPR interpretation is defensible for the calibrated high-kappa families (ffn,attnin the packaged policy). The lowerembedandothercaps are sentinel thresholds and can exceed a 5% Gaussian tail if interpreted alone. - Conservative tier applies Bonferroni (
method = "bonferroni", α = 0.000625, m = 4) with caps{ffn: 3.849, attn: 2.6, embed: 2.8, other: 2.8},sigma_quantile = 0.90, andmax_caps = 3, keeping WARNs within the calibrated budget. Scope isffnin the included tier policies, so only FFN blocks are actively budgeted under the Conservative spectral guard. - Deadband δ suppresses flicker around the cap: Balanced records δ = 0.10,
Conservative δ = 0.05, surfaced in reports via
spectral.summary.deadband. - reports expose the policy under
spectral.multiple_testing.{method,alpha,m},spectral.summary.{sigma_quantile,max_caps,deadband}, andspectral.family_caps[*].kappa. - The FPR story is a calibration assumption under the chosen null model for the calibrated families, not a theorem about arbitrary transformer weights or all sentinel thresholds.
- Empirical histograms of should be approximately standard normal; heavy tails → raise or use robust (MAD-scaled).
The deadband δ is a guardrail against flicker: relative changes within ±δ are treated as neutral, so WARNs only fire when sustained growth exceeds both δ and the family κ cap. Auditors can confirm the chosen δ directly in the report summary.
Runtime Contract (report)
- report exposes
spectral.summary.{sigma_quantile,deadband,modules_checked,max_caps,caps_exceeded},spectral.family_caps, andspectral.families[family]with{max, mean, count, violations, kappa}.sigma_quantileis the calibrated baseline percentile used to derive the reference target. - Tier files document multiple-testing metadata and the mapping modeled Gaussian tails. Sentinel caps should be audited as operational thresholds, not as FPR-controlled family caps.
- Policy metadata records the multiple-testing method
(
spectral.multiple_testing) and the cap limit (spectral.max_caps).
Observability
spectral.summary.{sigma_quantile,deadband,modules_checked,max_caps,caps_exceeded}spectral.family_caps[*].kappaandspectral.families[*].{kappa,violations}spectral.multiple_testing.{method,alpha,m}andspectral.max_caps
Worked example (Balanced tier)
- For FFN modules,
family_caps.ffn.kappa = 3.849. Suppose a layer reports . - report records a WARN in
spectral.families.ffn.violations += 1;spectral.caps_appliedincrements. - Balanced
max_caps = 5. After the fifth WARN the guard continues to WARN; the sixth triggersspectral.caps_exceeded=trueand the run aborts. - Multiple-testing metadata shows
spectral.multiple_testing = {method: "bh", alpha: 0.05, m: 4}so reviewers can verify the published policy and compute modeled tails for the calibrated caps.
Calibration
Calibration values are derived from null-sweep runs using the order-statistic
and parametric methods described in the tier calibration documentation
(09-tier-v1-calibration.md). The calibrated κ
values are stored in the packaged tiers.yaml
(runtime/tiers.yaml).
To recalibrate, run null baselines (no edit) and collect per-module z-scores. Allocate the WARN budget across families proportionally by module count, then set κ(f) via order-statistic (the B(f)-th largest |z| in that family) or parametric inversion of the tail probability. Add a small safety margin (η ≈ 0.05–0.10) and validate that subsequent null runs stay within the budget.
Basis column in Quality Gates tables: "point" = point estimate gate, "upper" = upper-bound gate, "point & upper" = both point and upper bounds must pass.
References
- Benjamini, Y., & Hochberg, Y. (1995). “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x