Architecture Overview

This document is the entry point to the LLenergyMeasure architecture documentation suite. It introduces the two major subsystems - the invariant miner pipeline and the runtime config-validation pipeline - and shows how they connect to the broader measurement framework.

Start here. Deep-dive docs for each subsystem are linked throughout.

Who this is for

End users running experiments: read the "Why configs are rejected" section and the parameter-discovery guide.
Engine extenders adding a new backend: read this overview, then miner-pipeline and extending-miners.
Researchers and paper readers: read this overview, then research-context for academic positioning.

System overview

LLenergyMeasure has two pipelines that work together to give users early, actionable feedback when their configs are invalid before an expensive engine initialisation takes place.

  ┌─────────────────────────────────────────────────────────────────────┐
  │  COMPILE-TIME (CI / Renovate-driven library bumps)                  │
  │                                                                     │
  │   Engine library source                                             │
  │   (transformers, vLLM, TRT-LLM)                                    │
  │              │                                                      │
  │              ▼                                                      │
  │   ┌─────────────────────────┐                                       │
  │   │   Invariant Miner       │  scripts/engine_miners/              │
  │   │   Pipeline              │                                       │
  │   │  ┌──────────────┐       │                                       │
  │   │  │ static miner │       │  AST walking of validator methods    │
  │   │  └──────────────┘       │                                       │
  │   │  ┌──────────────┐       │                                       │
  │   │  │dynamic miner │       │  combinatorial probing               │
  │   │  └──────────────┘       │                                       │
  │   │  ┌──────────────┐       │                                       │
  │   │  │  lift modules│       │  pydantic / msgspec / dataclass      │
  │   │  └──────────────┘       │                                       │
  │   │         │               │                                       │
  │   │    staging files        │                                       │
  │   │         │               │                                       │
  │   │    build_corpus.py      │  merge + dedup + fingerprint         │
  │   │         │               │                                       │
  │   │    validate_invariants.py      │  replay against live library         │
  │   │         │               │                                       │
  │   │  proposed corpus YAML   │  src/llenergymeasure/engines/          │
  │   │                          │  {e}.proposed.yaml                   │
  │   │  validated corpus YAML   │  src/llenergymeasure/engines/          │
  │   │                          │  {e}.validated.yaml                   │
  │   └─────────────────────────┘                                       │
  └─────────────────────────────────────────────────────────────────────┘
                      │
                      │ validated YAML ships with package
                      │
  ┌─────────────────────────────────────────────────────────────────────┐
  │  RUNTIME (user submits ExperimentConfig)                            │
  │                                                                     │
  │   User YAML / Python API                                            │
  │              │                                                      │
  │              ▼                                                      │
  │   ┌─────────────────────────┐                                       │
  │   │  Config Validation      │  src/.../config/engine_invariants/      │
  │   │  Pipeline               │  loader.py                           │
  │   │                         │                                       │
  │   │  ┌───────────────┐      │                                       │
  │   │  │ loader.py     │      │  parse corpus + evaluate predicates  │
  │   │  └───────────────┘      │                                       │
  │   │  ┌───────────────┐      │                                       │
  │   │  │ rule match    │      │  try_match() per rule per engine     │
  │   │  └───────────────┘      │                                       │
  │   │         │               │                                       │
  │   │    error / warn /       │                                       │
  │   │    dormant annotation   │                                       │
  │   └─────────────────────────┘                                       │
  │              │                                                      │
  │              ▼                                                      │
  │   User sees rejection BEFORE engine initialisation                  │
  │   (engine initialisation is expensive; this saves GPU time)         │
  └─────────────────────────────────────────────────────────────────────┘

The two pipelines

1. The invariant miner pipeline

What it does: Extracts validation invariants from ML engine library source code and packages them into a versioned corpus of structured rules. Runs in CI whenever a library version bumps (Renovate-driven).

Inputs: Engine library source code (at a pinned version).

Outputs: src/llenergymeasure/engines/{engine}/invariants.proposed.yaml (maintainer-seeded corpus, post-mining) and src/llenergymeasure/engines/{engine}/invariants.validated.yaml (CI-validated observed behaviour, post-validate-replay; both ship with the package).

Three components:

Static miner - walks Python AST of validator methods; no constructor calls.
Dynamic miner - instantiates config classes with combinatorial probe values; observes raise/no-raise patterns.
Lift modules (_pydantic_lift.py, _msgspec_lift.py, _dataclass_lift.py) - extract constraints directly from type-system metadata (Pydantic FieldInfo, msgspec Meta, stdlib Literal[...]).

Deep-dive: miner-pipeline.md

2. The parameter-discovery / config-validation pipeline

What it does: At runtime, when a user submits an ExperimentConfig, evaluates each invariant in the validated corpus against the config and rejects invalid combinations before engine initialisation begins.

Inputs: User's ExperimentConfig; validated corpus YAML.

Outputs: Error / warning / dormant annotations surfaced to the user via the CLI or the Python API.

Key components:

loader.py - parses the corpus and exposes Rule.try_match().
Loader grammar - the predicate DSL (type_is, @field_ref, not_divisible_by, etc.).
Gap reporting - flags when a config combination the corpus has no rule for is encountered.

Deep-dive: parameter-discovery.md

Broader framework context

Both pipelines sit inside the larger LLenergyMeasure architecture. The config-validation pipeline plugs into Layer 0 (config/), which the rest of the stack builds on.

  Layer 6  cli/            llem run, llem config
               │
  Layer 5  api/            run_experiment(), run_study()
               │
  Layer 4  study/          StudyRunner, sweep expansion
               │
  Layer 3  harness/        MeasurementHarness, energy sampling
               │
  Layer 2  engines/        PyTorch, vLLM, TensorRT-LLM plugins
               │
  Layer 1  infra/          Docker runner, container entrypoint
               │
  Layer 0  config/  ◄──── config validation pipeline lives here
           domain/         engine_invariants/loader.py
           device/
           utils/

The invariant miner pipeline lives in scripts/engine_miners/ - it is a build-time tool, not a library module. Its output is the validated corpus that ships with the package.

Data flow: end-to-end

  Library version bump (e.g. transformers 4.56.0 → 4.57.0)
               │
               ▼
  Renovate opens PR bumping engine_versions/{engine}.yaml
  (Dockerfile ARG default is derived at build time from the SSOT)
               │
               ▼
  Engine-invariants pipeline fires (probe + mine + validate)
               │
               ├──► transformers: engine-pipeline.yml builds the
               │    image (cache export only, no runtime push), then
               │    publish-engine-image.yml fires via workflow_run and
               │    pushes runtime tags (canonical for main/schedule, PR-
               │    time tag for PR builds). On its success, the
               │    invariants-transformers job in engine-pipeline.yml
               │    fires via workflow_run on GH-hosted ubuntu-latest.
               │
               ├──► vLLM: engine-pipeline.yml runs inside
               │    llenergymeasure:vllm-${VER} on a self-hosted GPU
               │    runner (Docker isolates from the unified uv.lock;
               │    see #437/#464).
               │
               ├──► TRT-LLM: engine-pipeline.yml runs inside
               │    llenergymeasure:tensorrt-${VER} on a self-hosted GPU
               │    runner (CUDA-aware import).
               │
               ▼
  Per-engine step sequence inside one job:
    1. Probe — scripts._probe checks landmarks; `fail` skips downstream.
    2. Mine — build_corpus.py writes
       src/llenergymeasure/engines/{engine}/invariants.proposed.yaml.
       (lift modules — pydantic / msgspec / dataclass — run inside
        build_corpus.py; static miner wins on match.fields, dynamic miner
        wins on message_template.)
    3. Vendor-replay — validate_invariants.py replays every rule against the
       live library (checks: kwargs_positive raises, message matches
       template, kwargs_negative does NOT raise). Confirmed cases write
       to src/llenergymeasure/engines/{engine}/invariants.validated.yaml; divergent
       rules surface as a non-zero exit when --fail-on-divergence is set.
    4. Doc-gen — generate_invariants_doc.py refreshes
       docs/generated/invariants-{engine}.md.
    5. Atomic writeback — one bot commit covers proposed.yaml,
       validated.yaml, the digest doc, and engine_versions/{engine}.compat.json.
               │
               ▼
  CI must be green before merge
               │
               ▼
  Package ships with updated corpus
               │
               ▼
  User submits ExperimentConfig
               │
               ▼
  loader.py evaluates rules against config
               │
               ▼
  Invalid combination caught BEFORE engine initialisation
  User sees: "config rejected: num_beams must be divisible by num_beam_groups"

Why validate before engine initialisation?

Engine initialisation is expensive: model weights load from disk, CUDA contexts initialise, and for TensorRT-LLM the engine may need compilation. A rejected config discovered after two minutes of initialisation wastes GPU time and researcher patience.

Pre-construction validation from a corpus catches the most common mistakes at config-parse time - a few milliseconds rather than several minutes.

The corpus complements, rather than replaces, engine-side validation: it captures invariants that fire only in specific combinations (cross-field constraints), silent normalisations (dormant rules), and invariants from methods that run at build time rather than construction time.

Why a versioned corpus instead of live introspection?

Live introspection at runtime would require importing each engine at startup - which on vLLM and TRT-LLM means initialising CUDA contexts. The corpus is pre-computed and ships as a JSON file that loads in a few milliseconds with no GPU dependency.

The trade-off is staleness risk: the corpus must be regenerated when the engine library changes. The Renovate-driven refresh loop and the validation-CI gate together enforce this discipline. See miner-pipeline.md - Renovate refresh loop.

Key concepts

Term	Meaning
Invariant miner	The umbrella for the mining pipeline; extracts constraints from library source
Static miner	The AST-walking component; reads source, no constructor calls
Dynamic miner	The probing component; constructs config objects, observes raises
Lift module	Type-system adapter; extracts constraints from Pydantic / msgspec / dataclass metadata
Corpus	The YAML file of extracted, validation-gate-confirmed invariants for one engine
Validated YAML	The CI-observed version of the corpus that ships with the package
Validation-CI gate	The step that replays every invariant against the live library; divergences fail CI
Fixpoint contract	`_fixpoint_test.py` - asserts dormant invariants converge to a stable state under repeated application
AddedBy	Provenance field on each invariant: `static_miner`, `dynamic_miner`, `pydantic_lift`, `msgspec_lift`, `dataclass_lift`, `manual_seed`, `runtime_warning`, `observed_collision` (full reference in validation-invariant-corpus.md)
MinerSource	The `{path, method, line_at_scan}` record pointing back to the library source line that produced an invariant
Loader grammar	The predicate DSL used in `match.fields`: `in`, `not_in`, `@field_ref`, `not_divisible_by`, `type_is`, etc.

File and package map

  scripts/
  └── miners/                     Invariant miner pipeline (build-time)
      ├── _base.py                Shared infrastructure: RuleCandidate, MinerError types,
      │                           AST primitives, pattern detectors
      ├── _pydantic_lift.py       Pydantic v2 sub-library lift
      ├── _msgspec_lift.py        msgspec sub-library lift
      ├── _dataclass_lift.py      stdlib dataclass sub-library lift
      ├── _fixpoint_test.py       Gate-soundness + corpus fixpoint contract
      ├── transformers_miner.py   Transformers orchestration entry
      ├── transformers_static_miner.py
      ├── transformers_dynamic_miner.py
      ├── vllm_static_miner.py    (in flight)
      ├── vllm_dynamic_miner.py   (in flight)
      ├── tensorrt_static_miner.py  (in flight)
      └── build_corpus.py         Merge + dedup + validation-gate orchestration

  scripts/
  ├── validate_invariants.py             Replay invariants against live library; write validated YAML
  └── _invariant_validation_common.py Shared capture + comparison utilities

  configs/
  └── engine_invariants/
      ├── transformers.proposed.yaml   Authoritative corpus post-mine (transformers)
      ├── transformers.validated.yaml   Validated observations post-replay (transformers)
      └── _staging/                    Per-miner staging output (not committed)

  src/llenergymeasure/config/
  └── engine_invariants/
      ├── loader.py                    Runtime corpus consumer + predicate engine
      └── __init__.py

  engine_versions/
  └── {engine}.yaml                    Per-engine SSOT: library version, miner pins,
                                       artefact paths. Renovate-authored.

Who this is for​

System overview​

The two pipelines​

1. The invariant miner pipeline​

2. The parameter-discovery / config-validation pipeline​

Broader framework context​

Data flow: end-to-end​

Why validate before engine initialisation?​

Why a versioned corpus instead of live introspection?​

Key concepts​

File and package map​

See also​