Skip to main content

Architecture Overview

This document is the entry point to the LLenergyMeasure architecture documentation suite. It introduces the two major subsystems - the invariant miner pipeline and the runtime config-validation pipeline - and shows how they connect to the broader measurement framework.

Start here. Deep-dive docs for each subsystem are linked throughout.


Who this is for


System overview

LLenergyMeasure has two pipelines that work together to give users early, actionable feedback when their configs are invalid before an expensive engine initialisation takes place.

┌─────────────────────────────────────────────────────────────────────┐
│ COMPILE-TIME (CI / Renovate-driven library bumps) │
│ │
│ Engine library source │
│ (transformers, vLLM, TRT-LLM) │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Invariant Miner │ scripts/engine_miners/ │
│ │ Pipeline │ │
│ │ ┌──────────────┐ │ │
│ │ │ static miner │ │ AST walking of validator methods │
│ │ └──────────────┘ │ │
│ │ ┌──────────────┐ │ │
│ │ │dynamic miner │ │ combinatorial probing │
│ │ └──────────────┘ │ │
│ │ ┌──────────────┐ │ │
│ │ │ lift modules│ │ pydantic / msgspec / dataclass │
│ │ └──────────────┘ │ │
│ │ │ │ │
│ │ staging files │ │
│ │ │ │ │
│ │ build_corpus.py │ merge + dedup + fingerprint │
│ │ │ │ │
│ │ validate_invariants.py │ replay against live library │
│ │ │ │ │
│ │ proposed corpus YAML │ src/llenergymeasure/engines/ │
│ │ │ {e}.proposed.yaml │
│ │ validated corpus YAML │ src/llenergymeasure/engines/ │
│ │ │ {e}.validated.yaml │
│ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘

│ validated YAML ships with package

┌─────────────────────────────────────────────────────────────────────┐
│ RUNTIME (user submits ExperimentConfig) │
│ │
│ User YAML / Python API │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Config Validation │ src/.../config/engine_invariants/ │
│ │ Pipeline │ loader.py │
│ │ │ │
│ │ ┌───────────────┐ │ │
│ │ │ loader.py │ │ parse corpus + evaluate predicates │
│ │ └───────────────┘ │ │
│ │ ┌───────────────┐ │ │
│ │ │ rule match │ │ try_match() per rule per engine │
│ │ └───────────────┘ │ │
│ │ │ │ │
│ │ error / warn / │ │
│ │ dormant annotation │ │
│ └─────────────────────────┘ │
│ │ │
│ ▼ │
│ User sees rejection BEFORE engine initialisation │
│ (engine initialisation is expensive; this saves GPU time) │
└─────────────────────────────────────────────────────────────────────┘

The two pipelines

1. The invariant miner pipeline

What it does: Extracts validation invariants from ML engine library source code and packages them into a versioned corpus of structured rules. Runs in CI whenever a library version bumps (Renovate-driven).

Inputs: Engine library source code (at a pinned version).

Outputs: src/llenergymeasure/engines/{engine}/invariants.proposed.yaml (maintainer-seeded corpus, post-mining) and src/llenergymeasure/engines/{engine}/invariants.validated.yaml (CI-validated observed behaviour, post-validate-replay; both ship with the package).

Three components:

  • Static miner - walks Python AST of validator methods; no constructor calls.
  • Dynamic miner - instantiates config classes with combinatorial probe values; observes raise/no-raise patterns.
  • Lift modules (_pydantic_lift.py, _msgspec_lift.py, _dataclass_lift.py) - extract constraints directly from type-system metadata (Pydantic FieldInfo, msgspec Meta, stdlib Literal[...]).

Deep-dive: miner-pipeline.md

2. The parameter-discovery / config-validation pipeline

What it does: At runtime, when a user submits an ExperimentConfig, evaluates each invariant in the validated corpus against the config and rejects invalid combinations before engine initialisation begins.

Inputs: User's ExperimentConfig; validated corpus YAML.

Outputs: Error / warning / dormant annotations surfaced to the user via the CLI or the Python API.

Key components:

  • loader.py - parses the corpus and exposes Rule.try_match().
  • Loader grammar - the predicate DSL (type_is, @field_ref, not_divisible_by, etc.).
  • Gap reporting - flags when a config combination the corpus has no rule for is encountered.

Deep-dive: parameter-discovery.md


Broader framework context

Both pipelines sit inside the larger LLenergyMeasure architecture. The config-validation pipeline plugs into Layer 0 (config/), which the rest of the stack builds on.

Layer 6 cli/ llem run, llem config

Layer 5 api/ run_experiment(), run_study()

Layer 4 study/ StudyRunner, sweep expansion

Layer 3 harness/ MeasurementHarness, energy sampling

Layer 2 engines/ PyTorch, vLLM, TensorRT-LLM plugins

Layer 1 infra/ Docker runner, container entrypoint

Layer 0 config/ ◄──── config validation pipeline lives here
domain/ engine_invariants/loader.py
device/
utils/

The invariant miner pipeline lives in scripts/engine_miners/ - it is a build-time tool, not a library module. Its output is the validated corpus that ships with the package.


Data flow: end-to-end

Library version bump (e.g. transformers 4.56.0 → 4.57.0)


Renovate opens PR bumping engine_versions/{engine}.yaml
(Dockerfile ARG default is derived at build time from the SSOT)


Engine-invariants pipeline fires (probe + mine + validate)

├──► transformers: engine-pipeline.yml builds the
│ image (cache export only, no runtime push), then
│ publish-engine-image.yml fires via workflow_run and
│ pushes runtime tags (canonical for main/schedule, PR-
│ time tag for PR builds). On its success, the
│ invariants-transformers job in engine-pipeline.yml
│ fires via workflow_run on GH-hosted ubuntu-latest.

├──► vLLM: engine-pipeline.yml runs inside
│ llenergymeasure:vllm-${VER} on a self-hosted GPU
│ runner (Docker isolates from the unified uv.lock;
│ see #437/#464).

├──► TRT-LLM: engine-pipeline.yml runs inside
│ llenergymeasure:tensorrt-${VER} on a self-hosted GPU
│ runner (CUDA-aware import).


Per-engine step sequence inside one job:
1. Probe — scripts._probe checks landmarks; `fail` skips downstream.
2. Mine — build_corpus.py writes
src/llenergymeasure/engines/{engine}/invariants.proposed.yaml.
(lift modules — pydantic / msgspec / dataclass — run inside
build_corpus.py; static miner wins on match.fields, dynamic miner
wins on message_template.)
3. Vendor-replay — validate_invariants.py replays every rule against the
live library (checks: kwargs_positive raises, message matches
template, kwargs_negative does NOT raise). Confirmed cases write
to src/llenergymeasure/engines/{engine}/invariants.validated.yaml; divergent
rules surface as a non-zero exit when --fail-on-divergence is set.
4. Doc-gen — generate_invariants_doc.py refreshes
docs/generated/invariants-{engine}.md.
5. Atomic writeback — one bot commit covers proposed.yaml,
validated.yaml, the digest doc, and engine_versions/{engine}.compat.json.


CI must be green before merge


Package ships with updated corpus


User submits ExperimentConfig


loader.py evaluates rules against config


Invalid combination caught BEFORE engine initialisation
User sees: "config rejected: num_beams must be divisible by num_beam_groups"

Why validate before engine initialisation?

Engine initialisation is expensive: model weights load from disk, CUDA contexts initialise, and for TensorRT-LLM the engine may need compilation. A rejected config discovered after two minutes of initialisation wastes GPU time and researcher patience.

Pre-construction validation from a corpus catches the most common mistakes at config-parse time - a few milliseconds rather than several minutes.

The corpus complements, rather than replaces, engine-side validation: it captures invariants that fire only in specific combinations (cross-field constraints), silent normalisations (dormant rules), and invariants from methods that run at build time rather than construction time.


Why a versioned corpus instead of live introspection?

Live introspection at runtime would require importing each engine at startup - which on vLLM and TRT-LLM means initialising CUDA contexts. The corpus is pre-computed and ships as a JSON file that loads in a few milliseconds with no GPU dependency.

The trade-off is staleness risk: the corpus must be regenerated when the engine library changes. The Renovate-driven refresh loop and the validation-CI gate together enforce this discipline. See miner-pipeline.md - Renovate refresh loop.


Key concepts

TermMeaning
Invariant minerThe umbrella for the mining pipeline; extracts constraints from library source
Static minerThe AST-walking component; reads source, no constructor calls
Dynamic minerThe probing component; constructs config objects, observes raises
Lift moduleType-system adapter; extracts constraints from Pydantic / msgspec / dataclass metadata
CorpusThe YAML file of extracted, validation-gate-confirmed invariants for one engine
Validated YAMLThe CI-observed version of the corpus that ships with the package
Validation-CI gateThe step that replays every invariant against the live library; divergences fail CI
Fixpoint contract_fixpoint_test.py - asserts dormant invariants converge to a stable state under repeated application
AddedByProvenance field on each invariant: static_miner, dynamic_miner, pydantic_lift, msgspec_lift, dataclass_lift, manual_seed, runtime_warning, observed_collision (full reference in validation-invariant-corpus.md)
MinerSourceThe {path, method, line_at_scan} record pointing back to the library source line that produced an invariant
Loader grammarThe predicate DSL used in match.fields: in, not_in, @field_ref, not_divisible_by, type_is, etc.

File and package map

scripts/
└── miners/ Invariant miner pipeline (build-time)
├── _base.py Shared infrastructure: RuleCandidate, MinerError types,
│ AST primitives, pattern detectors
├── _pydantic_lift.py Pydantic v2 sub-library lift
├── _msgspec_lift.py msgspec sub-library lift
├── _dataclass_lift.py stdlib dataclass sub-library lift
├── _fixpoint_test.py Gate-soundness + corpus fixpoint contract
├── transformers_miner.py Transformers orchestration entry
├── transformers_static_miner.py
├── transformers_dynamic_miner.py
├── vllm_static_miner.py (in flight)
├── vllm_dynamic_miner.py (in flight)
├── tensorrt_static_miner.py (in flight)
└── build_corpus.py Merge + dedup + validation-gate orchestration

scripts/
├── validate_invariants.py Replay invariants against live library; write validated YAML
└── _invariant_validation_common.py Shared capture + comparison utilities

configs/
└── engine_invariants/
├── transformers.proposed.yaml Authoritative corpus post-mine (transformers)
├── transformers.validated.yaml Validated observations post-replay (transformers)
└── _staging/ Per-miner staging output (not committed)

src/llenergymeasure/config/
└── engine_invariants/
├── loader.py Runtime corpus consumer + predicate engine
└── __init__.py

engine_versions/
└── {engine}.yaml Per-engine SSOT: library version, miner pins,
artefact paths. Renovate-authored.

See also