Extending the Invariant Miner: Adding a New Engine
This document is the practitioner's guide to adding invariant miner support for a new engine. It uses the transformers miner as the gold-standard reference throughout.
Audience: engine extenders. Assumes familiarity with the miner-pipeline.md concepts.
Before you start
- Read miner-pipeline.md to understand the static miner / dynamic miner / lift module split.
- Read the corpus format reference to understand what rules look like.
- Review
scripts/engine_miners/transformers_static_miner.pyandscripts/engine_miners/transformers_dynamic_miner.pyas the gold standard. The comments in those files contain important design decisions.
Step 0: Research the engine's validation surface
Before writing any code, answer these questions:
-
Which config classes does the engine validate? Where does validation happen - in
__init__, in a separatevalidate()method, in validators decorated with@model_validator? -
Which Python type system does each class use? (
pydantic.BaseModel/pydantic.dataclasses.dataclass,msgspec.Struct,@dataclasses.dataclass, or something else?) -
Does the engine constructor raise on invalid inputs, or silently normalise them? (Transformers and vLLM raise; TRT-LLM constructors are more permissive, so TRT-LLM has no dynamic miner.)
-
What is the CUDA / import dependency? Engines have no host install path — they are imported only inside their per-engine Docker images (see development.md). Within the engine container, can the miner
import enginelibon the CPU phase of the build, or does the import require a live CUDA runtime? (vLLM: importable insidellenergymeasure:vllm-${VER}without GPU at probe time; TRT-LLM: requires CUDA-aware import even inside the NGC container.) -
What is a realistic post-validation-CI invariant count? (Transformers: 46; vLLM: 80-110; TRT-LLM: 20-28.) This helps plan the scope.
Step 1: Add the fail-loud import contract
Create scripts/engine_miners/{engine}_miner.py (the orchestration entry point). The very first thing it must do:
import importlib.metadata
from scripts.engine_miners._base import check_installed_version, MinerLandmarkMissingError
from scripts.engine_miners._ssot import load_miner_pin
# Pin source-of-truth lives in ``engine_versions/{engine}.yaml`` under
# ``miner_pins.{static|dynamic|discovery}``. Pick the producer matching this
# miner's role; ``load_miner_pin`` returns a ``packaging.SpecifierSet``.
# Keep the upper bound tight (e.g. <4.60 not <99.0) so
# MinerVersionMismatchError fires on a library bump.
_envelope = load_miner_pin("myengine", "static")
_installed = importlib.metadata.version("your-engine-library")
check_installed_version("your-engine-library", _installed, _envelope)
# Raises MinerVersionMismatchError if installed version is outside the range.
# This is CI-fatal: the miner will not emit partial output.
Then declare landmark checks for every class or method the miner will walk:
import ast
import inspect
from scripts.engine_miners._base import find_class, find_method
from enginelib.config import SomeConfigClass
_source = inspect.getsource(SomeConfigClass)
_module = ast.parse(_source)
_cls = find_class(_module, "SomeConfigClass")
if _cls is None:
raise MinerLandmarkMissingError(
"SomeConfigClass",
"expected in enginelib.config - check if the class was renamed"
)
Why this matters: the Haiku-era TRT-LLM extractor imported LlmConfig - a class that does not exist in TRT-LLM 0.21.0. It caught the ImportError and silently returned []. The fail-loud contract makes silent coverage loss impossible.
Step 2: Apply the relevant lift module(s)
Based on your Step 0 research, apply one or more lift modules to extract constraints directly from type metadata.
All three lift modules expose a single function named lift with the same signature: lift(target_type, *, namespace, today, source_path) -> list[InvariantCandidate]. The engine/library is derived automatically from target_type.__module__. Import each lift under an alias to keep call sites readable.
If the engine uses Pydantic v2
from datetime import date
from scripts.engine_miners._pydantic_lift import lift as lift_pydantic
from enginelib.config import CacheConfig, SchedulerConfig
TODAY = date.today().isoformat()
def mine_pydantic_invariants():
invariants = []
for cls in [CacheConfig, SchedulerConfig]:
invariants.extend(lift_pydantic(
cls,
namespace="myengine.config",
today=TODAY,
source_path="enginelib/config.py",
))
return invariants
The lift emits one invariant per Gt, Ge, Lt, Le, MultipleOf, MinLen, MaxLen constraint and per Literal[...] allowlist found on any field.
If the engine uses msgspec
from scripts.engine_miners._msgspec_lift import lift as lift_msgspec
from enginelib.config import SamplingParams
def mine_msgspec_invariants():
return lift_msgspec(
SamplingParams,
namespace="myengine.sampling",
today=TODAY,
source_path="enginelib/sampling.py",
)
Note: if the class ships zero Meta(ge=...) annotations (common for msgspec classes), the lift returns [] - that is expected and not an error.
If the engine uses stdlib dataclasses
from scripts.engine_miners._dataclass_lift import lift as lift_dataclass
from enginelib.config import EngineArgs
def mine_dataclass_invariants():
return lift_dataclass(
EngineArgs,
namespace="myengine.args",
today=TODAY,
source_path="enginelib/args.py",
)
The dataclass lift is limited to Literal[...] value-allowlist invariants (no numeric bounds; stdlib dataclasses carry no bound metadata by default).
Step 3: Write the static miner
Create scripts/engine_miners/{engine}_static_miner.py. The static miner walks the AST of validator methods and emits rules for conditional raises, warnings, and silent normalisations.
Pattern: walking a validator method
import ast
import inspect
from scripts.engine_miners._base import (
find_class, find_method, extract_condition_fields,
filter_condition_references_self,
ConditionalRaiseDetector, ConditionalSelfAssignDetector,
ConditionalWarningsWarnDetector, ConditionalLoggerWarningDetector,
InvariantCandidate, MinerSource,
)
def walk_validate_method(cls_source: str, cls_name: str) -> list[InvariantCandidate]:
module = ast.parse(cls_source)
cls_node = find_class(module, cls_name)
if cls_node is None:
raise MinerLandmarkMissingError(cls_name)
validate = find_method(cls_node, "validate")
if validate is None:
raise MinerLandmarkMissingError(f"{cls_name}.validate")
public_fields = frozenset(
# derive from the class's dataclasses.fields() or __annotations__
)
detectors = (
ConditionalRaiseDetector(),
ConditionalSelfAssignDetector(),
ConditionalWarningsWarnDetector(),
ConditionalLoggerWarningDetector(),
)
candidates = []
for node in ast.walk(validate):
if not isinstance(node, ast.If):
continue
if not filter_condition_references_self(node.test, public_fields):
continue
for stmt in node.body:
for detector in detectors:
pattern = detector.detect(stmt)
if pattern is not None:
# build InvariantCandidate from pattern + condition
candidate = _build_candidate(node.test, pattern, ...)
candidates.append(candidate)
return candidates
Per-engine detector customisation
The five default detectors cover the most common patterns. For engine-specific patterns, write a custom detector:
# Example: engine uses self.errors.append(...) for error collection
class ErrorsAppendDetector:
def detect(self, stmt: ast.stmt) -> DetectedPattern | None:
if not isinstance(stmt, ast.Expr) or not isinstance(stmt.value, ast.Call):
return None
path = call_func_path(stmt.value)
if path != ["self", "errors", "append"]:
return None
return DetectedPattern(
severity="error",
emission_channel="none",
affected_field=None,
message_template=first_string_arg(stmt.value),
detail="self.errors.append",
)
Important: the "revisit" comment
Per the transformers static miner's header, per-engine miners currently define their own _detect_* functions rather than using _base.py's detector classes directly. This is because the DetectedPattern shape from _base.py doesn't carry the structured FieldPredicate data needed for cross-field corpus rules (operators like not_divisible_by and @field_ref). Once two or more engine miners exist and we can see whether the parallel detector logic is genuinely divergent or accidentally so, harmonise in a _base.py refactor.
Step 4: Write the dynamic miner (if applicable)
Create scripts/engine_miners/{engine}_dynamic_miner.py if the engine's constructors raise on invalid inputs.
Skip this step if: probing the engine's constructors yields zero raises. This is the case for TRT-LLM, where TrtLlmArgs(**kwargs) is extremely permissive at construction time; constraints are enforced in validator methods (covered by the static miner) or at build time.
Cluster definition
Clusters group related fields for Cartesian probing:
from dataclasses import dataclass, field
from typing import Any
@dataclass
class _Cluster:
name: str
fields: list[str]
values: dict[str, list[Any]]
constructor: type # e.g. SamplingParams
validate_method: str | None = None # e.g. "_verify_args"
CLUSTERS = [
_Cluster(
name="sampling_temperature",
fields=["temperature", "top_p", "top_k"],
values={
"temperature": [0.0, 0.5, 1.0, 2.0, -0.1],
"top_p": [0.0, 0.5, 1.0, 1.1],
"top_k": [0, 1, 50, -1],
},
constructor=SamplingParams,
),
]
Cluster size rule: if product(len(values[f]) for f in fields) > 200, use Hypothesis as a supplement instead of Cartesian product:
import itertools
import hypothesis.strategies as st
from hypothesis import given, settings
def probe_cluster(cluster: _Cluster) -> list[tuple[dict, str | None]]:
size = 1
for vs in cluster.values.values():
size *= len(vs)
if size <= 200:
# Cartesian probe
rows = []
for combo in itertools.product(*[cluster.values[f] for f in cluster.fields]):
kwargs = dict(zip(cluster.fields, combo))
rows.append(_run_probe(kwargs, cluster))
return rows
else:
# Hypothesis supplement (deterministic, fixed seed)
return _hypothesis_probe(cluster)
Important: Hypothesis is used here as a deterministic value generator with a fixed seed - not as a property-based test runner. The pipeline must be deterministic: the same library version + miner code must produce the same corpus.
Predicate inference
After probing, group error rows by message class and infer predicates:
def infer_predicates(rows: list[tuple[dict, str | None]]) -> list[InvariantCandidate]:
# Group by error message
by_message: dict[str, list[dict]] = {}
for kwargs, error in rows:
if error is not None:
by_message.setdefault(error, []).append(kwargs)
candidates = []
for message, trigger_kwargs in by_message.items():
# Try templates in order of preference:
# 1. cross-field divisibility: a % b != 0
# 2. cross-field comparison: a > b
# 3. type allowlist
# 4. single-field range
# 5. single-field equality
# 6. value allowlist
# Emit ALL plausible candidates (recall-first; validation CI prunes false positives)
...
return candidates
Step 5: Write the corpus orchestration entry
scripts/engine_miners/{engine}_miner.py is the main entry point:
def mine() -> list[InvariantCandidate]:
candidates = []
candidates.extend(mine_pydantic_invariants())
candidates.extend(mine_dataclass_invariants())
# Static miner
from scripts.engine_miners.myengine_static_miner import mine as static_mine
candidates.extend(static_mine())
# Dynamic miner (if applicable)
from scripts.engine_miners.myengine_dynamic_miner import mine as dynamic_mine
candidates.extend(dynamic_mine())
return candidates
if __name__ == "__main__":
import yaml
from scripts.engine_miners._base import candidate_to_dict
results = mine()
staging = {
"schema_version": "1.0.0",
"engine": ENGINE,
"invariants": [candidate_to_dict(c) for c in results],
}
output_path = Path("src/llenergymeasure/engines/_staging/myengine_miner.yaml")
output_path.write_text(yaml.dump(staging, allow_unicode=True))
print(f"Wrote {len(results)} candidates to {output_path}")
Step 6: Write fixpoint regression tests
Each per-engine miner ships with parametrised tests:
# tests/unit/scripts/engine_miners/test_myengine_miner.py
import pytest
from scripts.engine_miners.myengine_miner import CLUSTERS
@pytest.mark.parametrize("cluster", CLUSTERS, ids=lambda c: c.name)
def test_cluster_probes_without_crashing(cluster):
"""Each cluster must complete probing without an unhandled exception."""
rows = probe_cluster(cluster)
assert isinstance(rows, list)
def test_version_envelope_resolves():
"""The miner pin loaded from the engine SSOT must be a non-empty SpecifierSet."""
from packaging.specifiers import SpecifierSet
from scripts.engine_miners._ssot import load_miner_pin
envelope = load_miner_pin("myengine", "static")
assert isinstance(envelope, SpecifierSet)
assert str(envelope) != ""
def test_landmark_checks_raise_on_missing():
"""find_class returning None must raise MinerLandmarkMissingError."""
from scripts.engine_miners._base import find_class, MinerLandmarkMissingError
import ast
module = ast.parse("class Unrelated: pass")
cls = find_class(module, "SomeConfigClass")
assert cls is None
# Confirm caller raises (contract test)
with pytest.raises(MinerLandmarkMissingError):
if cls is None:
raise MinerLandmarkMissingError("SomeConfigClass")
Step 7: Add to CI
-
Decide the engine's CI shape:
- Upstream image consumer (vllm / tensorrt pattern) — add a per-engine
job to
engine-pipeline.ymlandengine-pipeline.ymlmirroring the existinginvariants-vllm/schemas-vllm(or*-tensorrt) jobs. The job pulls the upstream canonical image, then runs probe → mine → validate → doc-gen → atomic-writeback inline. - First-party image (transformers pattern) — split the build out into
a pair of workflows modelled on
engine-pipeline.yml(build + cache export, no runtime push) andpublish-engine-image.yml(workflow_run- triggered, pulls cache + pushes runtime tags per parent event). The transformers cells inengine-pipeline.yml+engine-pipeline.ymlthen chain off Publish engine image success viaworkflow_run. The build/push split exists so push failures don't cost a full rebuild of the heavy from-source compile (e.g. ~30 min FA3 compile) on retry.
- Upstream image consumer (vllm / tensorrt pattern) — add a per-engine
job to
-
Set the runner: every engine miner runs inside its own Docker image (no host extras exist — see development.md). For the upstream-image pattern, mirror
invariants-vllmas the template for engines whose miners need a GPU only forimport-time reasons; useinvariants-tensorrtas the template for engines whose Python source layout shifts across image releases, that bundle source in non- introspectable ways (NGC-derived bases), or that require CUDA-aware imports: the cell downloads the upstream release tarball on the runner host and bind-mounts it into the container at a stable path, decoupling source resolution from the image's internals. For the first-party-image pattern, mirrorengine-pipeline.yml+publish-engine-image.yml+ the workflow_run-gated cell pair in engine-pipeline.yml / engine-pipeline.yml. -
The validate step runs inside the engine's container in the same job as the miner — no separate validation workflow to update.
-
Add a Renovate
packageRuleso library bumps trigger the appropriate workflow via theengine_versions/{engine}.yamlpath filter (or, for the first-party-image pattern, viaengine-pipeline.yml's filter — downstreampublish-engine-image.ymland the workflow_run-gated cells fire automatically on its success).
Step 8: Generate and review the corpus
Run the miner locally (inside the engine's Docker container if CUDA is required):
python scripts/engine_miners/myengine_miner.py
# Writes src/llenergymeasure/engines/_staging/myengine_miner.yaml
python scripts/engine_miners/build_corpus.py --engine myengine
# Merges staging files, runs validation-CI gate, writes corpus
python scripts/validate_invariants.py \
--engine myengine \
--corpus src/llenergymeasure/engines/myengine.proposed.yaml \
--out src/llenergymeasure/engines/myengine.validated.yaml
# Validates all rules against live library
Review the corpus manually:
- Do the
kwargs_positiveexamples look right? - Are there rules that fire too broadly (false positives)?
- Are there obvious constraints the miner missed (coverage gaps)?
If coverage gaps exist, extend the miner. Only add manual_seed rules as a last resort, with a justification comment.
Transformers as the gold standard: key patterns
The transformers miner is the reference implementation. Key patterns to follow:
The find_class / find_method / MinerLandmarkMissingError contract
Every class and method the miner walks must be guarded:
cls_node = find_class(module, "GenerationConfig")
if cls_node is None:
raise MinerLandmarkMissingError("GenerationConfig")
method_node = find_method(cls_node, "validate")
if method_node is None:
raise MinerLandmarkMissingError("GenerationConfig.validate")
The public_fields filter
Derive public fields from the class's dataclass fields or __annotations__, and use filter_condition_references_self to drop predicates that don't reference a public field:
public_fields = frozenset(
f.name for f in dataclasses.fields(GenerationConfig)
if not f.name.startswith("_")
)
Unparseable sub-clauses: log, don't drop
When the static miner encounters a condition sub-clause it cannot translate (e.g. an opaque function call), it logs the clause and emits the surrounding invariant with the parseable parts. The invariant is still useful; the validation-CI gate will confirm whether it fires correctly:
# transformers_static_miner.py pattern:
if unparseable_clause:
logger.debug(
"static_miner: dropped sub-clause in %s.%s:%d: %s",
cls_name, method_name, node.lineno, ast.unparse(sub_clause)
)
# Continue emitting the rule without the sub-clause
Recall-first: emit all plausible candidates
Both static and dynamic miners err toward recall. The validation-CI gate is the prune step. Do not add extra filters "just in case" - if an invariant candidate is wrong, the validation-CI gate will quarantine it.
Failure modes when libraries evolve
When Renovate bumps an engine library, the miner pipeline must catch behavioural drift before stale invariants ship. Failures fall into three categories: loud failures caught by the miner pipeline at mining time, loud failures caught by the validation gate at validation time, and one silent failure mode the YAML/JSON split was specifically designed to make visible.
Loud failures caught by the miner pipeline
The miner pipeline's import-time contract (Step 1 above) raises hard CI errors when the library has drifted out of the envelope the miner was written against:
-
MinerVersionMismatchError- installed library version is outside the miner's pinned envelope (read fromengine_versions/{engine}.yaml miner_pins.{producer}viaload_miner_pin). Forces the maintainer to read release notes and either widen the envelope or update the miner to match new validator semantics.- Example: vLLM 0.7.3 against an SSOT pin of
>=0.17,<0.18raisesMinerVersionMismatchErrorat import. Observed empirically on PR #459'smine-vllmjob.
- Example: vLLM 0.7.3 against an SSOT pin of
-
MinerLandmarkMissingError- an expected class or method symbol is no longer present in the library source. Catches refactors where a class was renamed, moved to a different module, or an API was deprecated and removed.- Example: a hypothetical vLLM release dropping
vllm.sampling_params.StructuredOutputsParamswould raiseMinerLandmarkMissingErrorat the landmark-check step before any AST walking begins.
- Example: a hypothetical vLLM release dropping
-
ImportError/AttributeError- propagated raw if the miner uses a library symbol that has been refactored without a landmark guard. The fail-loud principle requires letting these propagate; never wrap landmark imports in atry/exceptthat returns[]. The Haiku-era TRT-LLM extractor was reverted in #423 specifically because it caughtImportErrorand silently degraded.
These three errors all surface as red CI on the Renovate PR, blocking merge until the miner is updated.
Loud failures caught by the validation gate
After mining completes and a YAML corpus is written, validate_invariants.py --fail-on-divergence replays each invariant's kwargs_positive and kwargs_negative against the live library inside the engine's Docker container:
--fail-on-divergenceflips the validation gate to non-zero exit when an existing invariant's declaredexpected_outcomeno longer matches the library's actual behaviour. This catches three distinct kinds of behavioural drift:- The library changed its validation behaviour for an existing rule (e.g. relaxed a numeric bound, changed an error type).
- The library dropped a rule entirely (the constraint no longer fires).
- The library added a new constraint path that the existing rule's
kwargs_negativeexample now happens to trip.
All three engines (transformers, vLLM, TRT-LLM) have --fail-on-divergence operational as of PR #445. Gate-breaking divergences are P0 incidents - they block the Renovate PR from merging.
Silent failure: recall regression
The validation gate above validates the invariants that exist in the corpus. It cannot tell you about invariants that should exist but no longer do, because the miner regressed and stopped finding them.
Concrete scenario: a refactor in _pydantic_lift.py changes how it walks FieldInfo.metadata, and the lift now finds 12 invariants where it previously found 30. The 18 lost invariants silently disappear from the corpus.
- The validation gate runs only on the 12 surviving invariants - every one of them passes.
- CI is green.
- The Renovate PR merges with a corpus that has 60% the recall it had before.
- Users hitting the lost validations get no constraint check at runtime.
Mitigation: the proposed-vs-validated YAML pair (the trust seam).
The engine-invariants pipeline (engine-pipeline.yml, with per-job if: gating selecting the right cell for each trigger source: pull_request: paths for vllm + tensorrt, workflow_run after Build engine image for transformers) mines the proposed corpus into src/llenergymeasure/engines/{engine}/invariants.proposed.yaml and then validates it into src/llenergymeasure/engines/{engine}/invariants.validated.yaml in the same job. Both YAMLs land in one atomic commit-back to the PR branch, and the per-pipeline diff comment includes both diffs.
Because the proposed-corpus diff is emitted alongside the validated diff, a miner refactor that silently drops 18 invariants shows up as 18 deletions in the proposed-corpus diff - a maintainer reading the PR notices the regression even when the validation gate's verdict on the surviving invariants is green.
The historical Stage-1 / Stage-2 split between auto-mine.yml and invariant-miner.yml forced the same property by serialising two workflows; the merger preserves the property by emitting two diffs from one workflow. Cross-reference: #450 (trust seam architecture decision), #465 (writeback contract).
Tooling for diagnosis
The fail-loud envelope and the YAML diff together cover the failure modes that trip on a routine library bump. Three planned tools extend this for harder cases:
- Pre-mining envelope check (#469). Verifies the installed library version is inside the SSOT-pinned envelope before CI invests effort in mining. Today the check happens at miner import time, which is fine but late - if mining takes 5 minutes and the version is wrong, the maintainer waits 5 minutes to find out.
- Compat-matrix sweep (#470). Runs the miner against every library version in a declared support range and reports per-version
(rule_count, divergences, errors). Surfaces "this miner mostly works on the new version but loses 3 rules" before a Renovate PR ever opens. - Coordinated bump command
llem bump-engine(#471). A single CLI entry point that updates the Dockerfile ARG, regenerates the corpus, runs the validation gate, and reports the diff in one local invocation - used by maintainers handling library bumps that need manual intervention (e.g.MinerVersionMismatchErrorresolution).
Common mistakes
| Mistake | Consequence | Fix |
|---|---|---|
Not pinning the miner envelope in engine_versions/{engine}.yaml | Miner runs against wrong library version silently | Set miner_pins.{static|dynamic|discovery} in the SSOT and call check_installed_version(load_miner_pin(...)) at import |
Catching ImportError on landmark imports | Silent degradation (returns [] on failure) | Let ImportError propagate; or raise MinerLandmarkMissingError explicitly |
| Cartesian-only probing with large clusters | Exponential probe count; CI timeouts | Add Hypothesis supplement for clusters > 200 combinations |
Adding manual_seed rules for automatable constraints | Pipeline-failure debt | Extend the miner instead |
| Using Hypothesis as property-based test runner (not value generator) | Non-deterministic corpus | Use hypothesis.strategies.from_type with a fixed seed; never @given |
Not calling find_method before walking | AttributeError on None if method renamed | Always guard: if method is None: raise MinerLandmarkMissingError(...) |
See also
- miner-pipeline.md - pipeline architecture reference
- validation-invariant-corpus.md - corpus format
- parameter-discovery.md - runtime validation
- architecture-overview.md - system overview
scripts/engine_miners/transformers_static_miner.py- gold-standard static minerscripts/engine_miners/transformers_dynamic_miner.py- gold-standard dynamic minerscripts/engine_miners/_base.py- shared infrastructure