Configuration Schema

Overview

Aspect	Details
Purpose	YAML configuration structure for `invarlock evaluate --preset` and advanced/internal preset-driven flows.
Audience	CLI users authoring presets or overrides.
Source of truth	`src/invarlock/core/config_runtime.py`, `src/invarlock/core/config_loader.py`, runtime profiles under `invarlock/_data/runtime`.
Network	Offline by default; use `evaluate --allow-network` when a preset-driven run needs downloads.
Execution	Model-loading commands run in the runtime container by default; host-side `invarlock evaluate` runs use `--execution-mode host`, while advanced/internal flows may use `INVARLOCK_ALLOW_HOST_EXECUTION=1` or `--allow-host-execution`.

Quick Start

model:
  id: gpt2
  adapter: hf_causal
  device: auto

dataset:
  provider: wikitext2
  seq_len: 512
  stride: 512
  preview_n: 240
  final_n: 240

edit:
  name: quant_rtn
  plan: { bitwidth: 8, clamp_ratio: 0.005 }

guards:
  order: ["invariants", "spectral", "rmt", "variance", "invariants"]

output:
  dir: runs/example

Concepts

Profiles and tiers: --profile selects runtime window counts; --tier resolves guard thresholds from tiers.yaml.
Defaults merging: the optional top-level defaults mapping is merged into the config before execution.
Evaluate adapter split: invarlock evaluate accepts --baseline-adapter and --subject-adapter, then materializes separate baseline and subject run configs. Each generated run config still uses the existing single-run model.adapter field.
Programmatic access: load_config() returns an explicit mapping-backed InvarLockConfig. Use cfg["model"]["id"] or cfg.require_section("model")["id"] as the canonical form. Typed attribute access is also available for known sections, for example cfg.model.id, when callers want the typed section-object view.
Unsupported keys: edit.kind, edit.parameters, unknown assurance.* keys, and guards.{spectral,rmt}.mode are rejected to keep the config surface explicit.

Precedence (highest → lowest)

Public CLI flags (e.g. --device, --tier); advanced/internal config runners also accept runner-specific flags such as --probes.
Profile selection (--profile ci|release) — window counts + determinism knobs.
YAML config selected through --preset or an internal config runner.
defaults: block in YAML (DRY base).
Packaged runtime defaults (fallback).

Key override matrix

Setting	CLI	Profile	YAML	defaults	Winner rule
`model.device`	`--device`	—	✅	✅	CLI wins.
`dataset.preview_n/final_n`	—	✅	✅	✅	Profile wins.
`auto.tier`	`--tier`	—	✅	✅	CLI wins.
`auto.probes`	internal `--probes`	—	✅	✅	Internal runner flag wins.

Confirm in report.meta.device, report.meta.auto, and report.data.preview_n/final_n.

Worked example: if YAML sets preview_n: 64 and you run --profile ci, the report shows preview_n=240 because the CI profile overrides the YAML counts.

Config → Report → report → Verify

Config area	Report fields	report fields	Verify gates
`model.*`	`report.meta.{model_id,adapter,device}`	`report.meta.{model_id,adapter,device}`	Schema only.
`dataset.*`	`report.data.*`, `report.dataset.windows.stats`, `report.provenance.provider_digest`	`report.dataset.*`, `report.provenance.provider_digest`	Pairing + provider digest checks (CI/Release).
`eval.*`	`report.metrics.primary_metric`	`report.primary_metric`, `validation.*`, `primary_metric_tail`	Ratio/counts + drift band (CI/Release).
`guards.*`	`report.guards[]`, `report.guard_overhead`	`report.spectral/rmt/variance`, `resolved_policy.*`, `guard_overhead`	Measurement contracts + overhead (Release).
`auto.*` / `--profile`	`report.meta.auto`, `report.context.profile`	`report.auto`, `report.meta.profile`	Schema only.
`output.*`	`report.artifacts.*`	`report.artifacts.*`	Schema only.

Reference

Model

model:
  id: <hf_id_or_path>
  adapter: auto
  device: auto
  # extra adapter kwargs (passed to load_model)
  dtype: float16
  low_cpu_mem_usage: true
  memory_efficient_load: true
  trust_remote_code: false
  # Optional: v5-native HF quantization config (e.g., bitsandbytes)
  # quantization_config:
  #   quant_method: bitsandbytes
  #   bitwidth: 8

For HF adapters, memory_efficient_load defaults to automatic behavior: accelerated loads receive a hardware-aware dtype when unset, HF loading uses low_cpu_mem_usage=True, and large/MoE model IDs receive device_map: auto on accelerated devices. Set explicit values to override the defaults, or set memory_efficient_load: false to opt out.

Dataset

dataset:
  provider: wikitext2
  split: validation
  seq_len: 512
  stride: 512
  preview_n: 240
  final_n: 240
  seed: 42

Supported providers: wikitext2, synthetic, hf_text, local_jsonl, vision_text, hf_seq2seq, local_jsonl_pairs, seq2seq.

Edit (built-in quant_rtn)

quant_rtn is a deterministic RTN quantize/dequantize simulation. It writes floating-point dequantized weights back into the model; use quantized adapters or external subject artifacts for deployable packed quantization backends.

edit:
  name: quant_rtn
  plan:
    bitwidth: 8
    per_channel: true
    clamp_ratio: 0.005
    scope: attn
    max_modules: 12

Only edit.plan is supported for built-in edit configuration.

Auto policy hints

auto:
  enabled: true
  tier: balanced
  probes: 0
  target_pm_ratio: 1.0

Primary metric policy hints

primary_metric:
  acceptance_range: {min: 0.95, max: 1.10}
  drift_band: {min: 0.90, max: 1.20}
  overhead_threshold: 0.01

Guards

guards:
  order: ["invariants", "spectral", "rmt", "variance", "invariants"]
  spectral:
    sigma_quantile: 0.95
  rmt:
    epsilon_by_family: { ffn: 0.01, attn: 0.01, embed: 0.01, other: 0.01 }
  variance:
    min_gain: 0.0

Context (snapshot controls)

assurance:
  mode: strict   # strict|off

context:
  assurance:
    mode: strict
  run:
    strict_guard_prepare: true
    strict_eval: true
    skip_overhead_check: false   # release/ci explicit skip marker
    tiny_relax: false            # dev/demo-only relaxed gating
  eval:
    strict_errors: true
    tiny_relax: false
  snapshot:
    mode: auto
    ram_fraction: 0.4        # chunk when model exceeds this RAM fraction
    threshold_mb: 768        # absolute chunked-snapshot threshold
    disk_free_margin_ratio: 1.2
    temp_dir: /tmp

assurance.mode: strict is the current fail-closed assurance path. Strict mode requires CI/release profile, balanced/conservative tier, canonical guard order, complete guard evidence, strict paired metric evidence, and verified runtime provenance. assurance.mode: off is for exploratory/dev reports only.

Output

output:
  dir: runs/example
  save_model: false
  model_dir: runs/exports/my_model  # optional
  model_subdir: model               # optional

Metrics

eval:
  max_pm_ratio: 1.5
  metric:
    kind: auto            # resolver value; report kinds include ppl_causal, ppl_mlm, ppl_seq2seq, accuracy, bleu, f1, rouge
    reps: 2000
    ci_level: 0.95

Troubleshooting

Unsupported keys rejected: remove edit.kind, edit.parameters, unknown assurance.* keys, or guard mode keys.
Provider not found: verify dataset.provider and install invarlock[eval].
Preset drift: run python scripts/docs/docs_check.py --config-schema-sync after edits.

Observability

report.meta.config captures the RunConfig applied by the runner.
report.context records profile/auto context used for tier resolution.
reports include resolved policy snapshots under resolved_policy.*.

CLI Reference
Dataset Providers
Guards
Model Adapters
Tier Policy Catalog
Environment Variables
Config Gallery — Worked YAML examples
Reports Reference — Where config values surface in the report