Back to blog

From Sweep Outputs to Tier Policy

Ink/charcoal doodle: sweep artifacts flow into a tier-policy patch and then into runtime policy and report-visible resolved policy.

Calibration becomes operational when sweep artifacts end in reviewable YAML patches that later appear as resolved runtime policy in reports.

4 min read
InvarLock Team

Research Note: a calibration story is strongest when it survives as policy

Highlights

  • Each calibration sweep emits four outputs: JSON, CSV, Markdown, and a tiers_patch_*.yaml recommendation.
  • Approved patch keys merge into tiers.yaml; the catalog separates calibration-shaped keys (family caps, VE min-effect) from explicit policy choices (floors, deadbands, caps).
  • The same key shows up later in the run's resolved_policy.*, giving the sweep → patch → policy → resolved-policy path a traceable identity.

Many calibration workflows stop at interpretation. They produce a chart, a summary paragraph, or a recommendation in prose, then leave the product surface ambiguous.

InvarLock's public calibration surface is stronger when read the other way around.

The interesting output is the patch. The summary helps a human understand the sweep, but the patch is what connects the result to the runtime tier policy that later governs real evaluations.

Why The Patch Is The Real Output

The calibration CLI reference makes this explicit. Both null sweeps and VE sweeps emit JSON, CSV, Markdown, and a tiers_patch_*.yaml recommendation file.

That output hierarchy matters. JSON and CSV preserve machine-readable evidence. Markdown gives a human summary. But the YAML patch is the artifact that can actually change runtime behavior after review.

If calibration stops before that point, it remains analysis. Once it reaches a patch, it becomes a reviewable policy proposal.

How Tier Policy Absorbs Calibrated Keys

The tier-policy catalog is useful here because it draws an important distinction: some values are calibrated from pilot or null runs, while others remain explicit policy choices such as floors, caps, and deadbands.

That distinction is easiest to see with examples. Spectral family caps and VE minimum-effect settings are calibration-shaped. Deadbands, floors, and some caps remain policy choices around those calibrated values.

That prevents a common confusion. A calibration sweep does not "discover the whole tier." It recommends a narrow set of calibrated values for a policy surface that still includes deliberate design choices.

This is why the patch needs review. It is proposing edits to a policy document, not revealing a timeless truth.

Why resolved_policy.* Matters Downstream

The guards reference and policy catalog both point to the same operational idea: the policy that actually governed a run should be observable later in the report surface.

This is what gives the patch path continuity.

The sweep produces a recommendation. The recommendation lands in tier policy after review. Later, a run exposes the resolved values it actually used. That means calibration is not trapped in an offline notebook. It becomes part of the evidence chain for future evaluation and verification.

Without that downstream visibility, the patch would still be better than prose, but much weaker as an audit surface.

That handoff is easier to review when the patch and later report expose the same policy path, for example:

balanced:
  variance_guard:
    min_effect_lognll: 0.016
{
  "resolved_policy": {
    "variance": {
      "min_effect_lognll": 0.016
    }
  }
}

Why Review Still Matters

The calibration docs are careful about this too. They show how to inspect a patch and merge it into tiers.yaml, but they do not present that step as automatic or self-justifying.

That is the right stance. A calibrated recommendation is stronger than intuition, but it is still a recommendation. Operators should be able to inspect the diff, ask whether the sweep assumptions are still local and valid, and decide whether the change deserves adoption.

Review is not friction around calibration. It is part of calibration.

What This Path Still Does Not Prove

The claim here should remain narrow.

A clean patch path does not prove that every calibrated value transfers across families, hardware, or window budgets. It does not prove that every policy knob is empirical. And it does not remove the need to revisit thresholds when guard contracts change.

What it does show is that the public calibration story is operational. It has a stable output, a review surface, and a downstream policy trace.

Claim Map

The practical path is:

  • run a policy-tuning sweep
  • inspect JSON, CSV, and Markdown outputs
  • review the emitted tiers_patch_*.yaml
  • merge approved keys into tier policy
  • verify later runs against the resolved_policy.* they actually used

That is a much stronger systems story than a calibration notebook with a conclusion slide.

Limitations

  • The patch path is the publication shape; whether a given recommendation is locally valid is a review decision, not a derivation result.
  • This post traces continuity from sweep through to report; it does not argue any specific key is calibrated correctly.
  • Command syntax, output names, and merge mechanics live in the calibration and tier-policy references — this post connects them rather than restating them.

Sources

More in Research Note

Continue through nearby posts in the same reading thread.