What Belongs in evaluation.report.json
An evaluation report is strongest when it is treated as a stable evidence contract: a small required core, meaningful optional blocks, and a clear boundary around what still lives outside the JSON.
Research Note: a report file should behave like a contract, not a dump
Highlights
evaluation.report.jsonhas a small required core and a larger reviewer-facing optional surface.- The required core is stable because downstream verification and tooling depend on it.
- The report is a key evidence object, but it is not the whole archive by itself.
It is easy to talk about a report file as if it were just export format. That framing is too weak for a system that expects other people to review, parse, and re-check the result later.
InvarLock's public reports reference goes further than that.
The docs define evaluation.report.json as a contract surface with a stable required core, a bounded optional surface, and a direct relationship to invarlock verify. That is a very different role from "some JSON we happened to save."
The Stable Required Core
The reports reference is explicit about the minimum contract. A valid v1 report must carry schema_version, run_id, meta, dataset, artifacts, plugins, and primary_metric.
That required core is not arbitrary. It captures the smallest stable surface needed to identify the run, describe the paired data surface, locate artifacts, snapshot the plugin environment, and state the canonical primary-metric result.
This is the part of the report that other tools can rely on with the highest confidence. It is the minimum shape that lets the file behave like a real evidence contract instead of an informal snapshot.
Optional Blocks That Still Matter
The optional surface is where readers often get lazy.
validation, policy_digest, resolved_policy, primary_metric_tail, confidence, provenance, and related blocks are not always required by the schema, but they are still reviewer-critical when present. The reading guide makes this obvious: policy configuration, measurement contracts, provenance digests, and confidence labels are the fields that often tell a reviewer why the result should or should not be trusted.
So "optional" here does not mean "decorative." It means the contract is keeping a stable core while allowing evidence-rich extensions to evolve.
Why The Report Is Not The Whole Bundle
The artifact layout docs draw a boundary that is worth making explicit: container-backed outputs emit runtime.manifest.json next to evaluation.report.json, and archives should retain the baseline and subject report.json files as well.
That matters because the report is a derived evaluation object. It is central, but it is not self-sufficient. A reviewer who wants to re-check pairing, provenance, and runtime provenance needs more than the final JSON alone.
This is the right design. The report should be the main summary contract without pretending to be the entire retained record.
Why A Stable Contract Helps Review And Tooling
The report-to-verify flow is the practical reason this matters.
invarlock verify is not reading the file as a pretty export. It is using it as the surface for schema checks, pairing checks, ratio math, and required runtime provenance. That gives the report contract real operational weight.
The same structure also helps downstream tooling. Parsers, HTML renderers, and review UIs can rely on the required blocks while gracefully taking advantage of richer optional evidence when it exists.
What The Report Still Does Not Contain
The claim should stay narrow.
evaluation.report.json does not contain the whole archive. It does not replace the baseline report. It does not replace runtime.manifest.json. And it does not remove the need to preserve the surrounding evidence layout when the goal is later re-verification.
The right way to value the file is not as a total record. It is as the stable center of a larger evidence bundle.
Claim Map
The practical reading is:
- keep the required core stable
- use optional blocks to expose richer policy and provenance evidence
- verify the report as a contract surface
- archive the report with its adjacent manifest and source reports
That is a much better model than "just save the JSON."
Limitations
- This post explains the public report contract; it does not add a fresh report example.
- Optional blocks remain important even though they are not all required by the schema.
Sources
More in Research Note
Continue through nearby posts in the same reading thread.
Research Note
Calibration Is the Product Surface, Not a Side Utility
Calibration is not just analysis around the product. It changes how thresholds are derived, when correction paths may turn on, and which policy values later govern reports.
Research Note
From Sweep Outputs to Tier Policy
Calibration becomes operational when sweep artifacts end in reviewable YAML patches that later appear as resolved runtime policy in reports.
Research Note
Variance Enablement Should Be Evidence-Gated
Variance equalization is stronger when it must earn enablement through predictive evidence, explicit tier knobs, and report-visible provenance.