One Run Lifecycle

Overview

AspectDetails
PurposeMap one evaluate -> verify -> report journey to code and artifact owners.
AudienceMaintainers, reviewers auditing the assurance boundary, contributors tracing failures.
Contract scopeCurrent strict assurance flow and report v1 artifacts.
Source of truthsrc/invarlock/cli/commands/evaluate.py, src/invarlock/core/evaluate_plan.py, src/invarlock/core/assurance_contract.py, src/invarlock/reporting/verify_contract.py.

This page maps one evaluate -> verify -> report journey to the code and artifact surfaces reviewers inspect.

Quick Start

The minimal end-to-end trace for a single comparison:

invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject distilgpt2 \
  --baseline-adapter auto --subject-adapter auto \
  --profile ci \
  --assurance strict \
  --report-out reports/eval

invarlock verify --assurance strict reports/eval/evaluation.report.json
invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html

Each stage emits artifacts the next stage consumes; reviewers can pause at any stage to inspect the surface in the table below. The evaluate command uses the runtime container by default for model-loading work; host execution must be an explicit non-assurance bypass.

Stage Map

StageCode surfaceArtifact surface
CLI planninginvarlock.cli.commands.evaluate, invarlock.core.evaluate_planselected profile, tier, preset, adapter, runtime policy
Runtime policyinvarlock.runtime_security, invarlock.cli.evaluate_phasesruntime.manifest.json
Config loadinginvarlock.core.config_loadernormalized run config, context.assurance
Component resolutioninvarlock.cli.run_execution, guard/adapter/edit registriesresolved adapter, edit, and guard order
Guard executioninvarlock.core.runner, invarlock.guards.*guard evidence and statuses
Metric computationinvarlock.core.bootstrap, runner metric helperspaired delta log-loss, ratio, CI fields
Report assemblyinvarlock.reporting.report_makeevaluation.report.json
Verificationinvarlock.reporting.verify_contractverifier pass/fail details
Human reportinvarlock report htmlrendered HTML report

Assurance Boundary

The strict assurance boundary starts at CLI planning and ends at verifier acceptance. Strict mode is not inferred from profile names alone; it is recorded in assurance.mode and checked by the verifier.

Debugging Rule

When a strict report fails verification, fix the earliest source evidence that caused the failure. Do not patch the report artifact by hand. The stage table above lets you trace a failure back to the owning code path and the artifact where the evidence is recorded.