Artifact Layout

Overview

AspectDetails
PurposeExplain where run outputs and reports live.
AudienceOperators archiving evidence and CI outputs.
Scoperuns/ scratch outputs and reports/ long-lived evidence.
Source of truthCLI run/report commands (src/invarlock/cli/commands/run.py).

Quick Start

# Run baseline
invarlock run -c configs/presets/causal_lm/wikitext2_512.yaml --out runs/baseline

# Generate report
invarlock report --run runs/baseline/report.json --format report --output reports/baseline

Concepts

  • runs/ is scratch space: timestamped run directories with report.json + events.jsonl.
  • reports/ is evidence: copy report.json and reports for audit.
  • reports reference baseline reports; keep them together to preserve pairing.

Command outputs

CommandWritesWhat to archive
invarlock runruns/<name>/<timestamp>/report.json, events.jsonlBaseline + subject report.json.
invarlock report --format reportreports/<name>/evaluation.report.jsonreport + baseline report.
invarlock report htmlreports/<name>/evaluation.htmlOptional (can be rebuilt).

Reference

Run outputs (runs/)

runs/
  baseline/
    20251010_182515/
      report.json
      events.jsonl
  quant8/
    20251010_151826/
      report.json
      events.jsonl

Reports and reports (reports/)

reports/
  baseline/
    report.json
  quant8_balanced/
    evaluation.report.json
    report.json

Archive checklist

  • Move baseline + subject report.json into reports/.
  • Keep evaluation.report.json with the baseline report.
  • Retain events.jsonl only if debugging; HTML exports are optional.
  • Prune timestamped runs/ once evidence is archived.
ArtifactWhy archiveRequired for verify
report.json (baseline + subject)Metrics, windows, provenanceYes
evaluation.report.jsonEvaluation report snapshotYes
events.jsonlDebugging timelineNo
evaluation.htmlHuman reviewNo

Seeds, hashes, and policy digests

  • report.meta.seeds includes Python/NumPy/Torch seeds.
  • report.meta.tokenizer_hash and dataset digests support pairing verification.
  • reports record policy_digest and resolved tier policy snapshots.

Cleanup checklist

  1. Copy report.json and evaluation.report.json into reports/ for retention.
  2. Keep baseline reports alongside derived reports for pairing checks.
  3. Remove stale timestamped runs once evidence is archived.

Troubleshooting

  • Missing baseline report: reports cannot be validated without the baseline report.json; keep it alongside the report.
  • Large run dirs: prune old timestamped runs after archiving reports.

Observability

  • report.json is the canonical source for metrics/guards.
  • events.jsonl provides per-phase logs for debugging.