From Sweep Outputs to Tier Policy
Calibration becomes operational when sweep artifacts end in reviewable YAML patches that later appear as resolved runtime policy in reports.
Research Note: a calibration story is strongest when it survives as policy
Highlights
- Each calibration sweep emits four outputs: JSON, CSV, Markdown, and a
tiers_patch_*.yamlrecommendation. - Approved patch keys merge into
tiers.yaml; the catalog separates calibration-shaped keys (family caps, VE min-effect) from explicit policy choices (floors, deadbands, caps). - The same key shows up later in the run's
resolved_policy.*, giving the sweep → patch → policy → resolved-policy path a traceable identity.
Many calibration workflows stop at interpretation. They produce a chart, a summary paragraph, or a recommendation in prose, then leave the product surface ambiguous.
InvarLock's public calibration surface is stronger when read the other way around.
The interesting output is the patch. The summary helps a human understand the sweep, but the patch is what connects the result to the runtime tier policy that later governs real evaluations.
Why The Patch Is The Real Output
The calibration CLI reference makes this explicit. Both null sweeps and VE sweeps emit JSON, CSV, Markdown, and a tiers_patch_*.yaml recommendation file.
That output hierarchy matters. JSON and CSV preserve machine-readable evidence. Markdown gives a human summary. But the YAML patch is the artifact that can actually change runtime behavior after review.
If calibration stops before that point, it remains analysis. Once it reaches a patch, it becomes a reviewable policy proposal.
How Tier Policy Absorbs Calibrated Keys
The tier-policy catalog is useful here because it draws an important distinction: some values are calibrated from pilot or null runs, while others remain explicit policy choices such as floors, caps, and deadbands.
That distinction is easiest to see with examples. Spectral family caps and VE minimum-effect settings are calibration-shaped. Deadbands, floors, and some caps remain policy choices around those calibrated values.
That prevents a common confusion. A calibration sweep does not "discover the whole tier." It recommends a narrow set of calibrated values for a policy surface that still includes deliberate design choices.
This is why the patch needs review. It is proposing edits to a policy document, not revealing a timeless truth.
Why resolved_policy.* Matters Downstream
The guards reference and policy catalog both point to the same operational idea: the policy that actually governed a run should be observable later in the report surface.
This is what gives the patch path continuity.
The sweep produces a recommendation. The recommendation lands in tier policy after review. Later, a run exposes the resolved values it actually used. That means calibration is not trapped in an offline notebook. It becomes part of the evidence chain for future evaluation and verification.
Without that downstream visibility, the patch would still be better than prose, but much weaker as an audit surface.
That handoff is easier to review when the patch and later report expose the same policy path, for example:
balanced:
variance_guard:
min_effect_lognll: 0.016
{
"resolved_policy": {
"variance": {
"min_effect_lognll": 0.016
}
}
}
Why Review Still Matters
The calibration docs are careful about this too. They show how to inspect a patch and merge it into tiers.yaml, but they do not present that step as automatic or self-justifying.
That is the right stance. A calibrated recommendation is stronger than intuition, but it is still a recommendation. Operators should be able to inspect the diff, ask whether the sweep assumptions are still local and valid, and decide whether the change deserves adoption.
Review is not friction around calibration. It is part of calibration.
What This Path Still Does Not Prove
The claim here should remain narrow.
A clean patch path does not prove that every calibrated value transfers across families, hardware, or window budgets. It does not prove that every policy knob is empirical. And it does not remove the need to revisit thresholds when guard contracts change.
What it does show is that the public calibration story is operational. It has a stable output, a review surface, and a downstream policy trace.
Claim Map
The practical path is:
- run a policy-tuning sweep
- inspect JSON, CSV, and Markdown outputs
- review the emitted
tiers_patch_*.yaml - merge approved keys into tier policy
- verify later runs against the
resolved_policy.*they actually used
That is a much stronger systems story than a calibration notebook with a conclusion slide.
Limitations
- The patch path is the publication shape; whether a given recommendation is locally valid is a review decision, not a derivation result.
- This post traces continuity from sweep through to report; it does not argue any specific key is calibrated correctly.
- Command syntax, output names, and merge mechanics live in the calibration and tier-policy references — this post connects them rather than restating them.
Sources
More in Research Note
Continue through nearby posts in the same reading thread.
Research Note
Calibration Is the Product Surface, Not a Side Utility
Calibration is not just analysis around the product. It changes how thresholds are derived, when correction paths may turn on, and which policy values later govern reports.
Research Note
Variance Enablement Should Be Evidence-Gated
Variance equalization is stronger when it must earn enablement through predictive evidence, explicit tier knobs, and report-visible provenance.
Research Note
What Belongs in evaluation.report.json
An evaluation report is strongest when it is treated as a stable evidence contract: a small required core, meaningful optional blocks, and a clear boundary around what still lives outside the JSON.