Python API reference
⚠️ Pre-1.0 — Python API surface is unstable. LLenergyMeasure is currently pre-1.0. The Python library API documented here is not yet a stable public surface — class signatures, function names, and module paths may change between minor versions without notice.
The supported user-facing interfaces are the CLI (
llem run,llem config) and the YAML study config. See CLI reference and Study config for stable contracts.The library API will stabilise at v1.0.0.
Auto-generated by
scripts/generate_api_docs.pyfrom docstrings insrc/llenergymeasure/. Do not edit manually — edits are overwritten on the next build.
__version__
Current version: 0.9.0
class ExperimentConfig
v2.0 experiment configuration.
Central configuration object controlling all aspects of a single LLM inference efficiency measurement. Organised into semantic groups:
- task: What to measure (model, dataset, token limits, seed)
- measurement: How to measure (warmup, baseline, energy sampler)
- Engine sections (transformers:, vllm:, tensorrt:): How to execute
The engine section must match the engine field. Providing a transformers: section when engine=vllm is a configuration error.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
task | llenergymeasure.config.models.TaskConfig | (required) | Task configuration: model, dataset, workload shape |
engine | Engine | <Engine.TRANSFORMERS: 'transformers'> | Inference engine |
measurement | llenergymeasure.config.models.MeasurementConfig | (required) | Measurement methodology: warmup, baseline, energy sampling |
sampling_preset | Optional[Literal['deterministic', 'standard', 'creative', 'factual']] | None | Sampling preset. When set, preset values are merged into the active engine's sampling section at parse time; explicit YAML values take precedence over preset values. |
transformers | `llenergymeasure.config.engine_configs.TransformersConfig | None` | None |
vllm | `llenergymeasure.config.engine_configs.VLLMConfig | None` | None |
tensorrt | `llenergymeasure.config.engine_configs.TensorRTConfig | None` | None |
lora | `llenergymeasure.config.models.LoRAConfig | None` | None |
passthrough_kwargs | `dict[str, Any] | None` | None |
class ExperimentResult
Experiment result — the user-visible output of a measurement run.
Combines raw results from all processes into a single result with proper aggregation (sum energy, average throughput). For single-GPU experiments, process_results has exactly one item.
v2.0 schema: all fields ship together (decision #50).
Fields
| Field | Type | Default | Description |
|---|---|---|---|
schema_version | str | '3.0' | Result schema version |
experiment_id | str | (required) | Unique experiment identifier |
measurement_config_hash | str | (required) | SHA-256[:16] of ExperimentConfig (environment excluded) |
llenergymeasure_version | `str | None` | None |
engine | str | 'transformers' | Inference engine used |
engine_version | `str | None` | None |
model_name | str | 'unknown' | Model name/path used |
measurement_methodology | Literal['total', 'steady_state', 'windowed'] | (required) | What was measured — total run, steady-state window, or explicit window |
steady_state_window | `tuple[float, float] | None` | None |
total_tokens | int | (required) | Total tokens across all processes |
total_energy_j | float | (required) | Total energy (sum across processes) |
total_inference_time_sec | float | (required) | Total inference time |
avg_tokens_per_second | float | (required) | Average throughput |
avg_energy_per_token_j | float | (required) | Average energy per token |
mj_per_tok_adjusted | `float | None` | None |
mj_per_tok_total | `float | None` | None |
total_flops | float | (required) | Total FLOPs (reference metadata) |
flops_per_output_token | `float | None` | None |
flops_per_input_token | `float | None` | None |
flops_per_second | `float | None` | None |
baseline_power_w | `float | None` | None |
energy_adjusted_j | `float | None` | None |
energy_per_device_j | `list[float] | None` | None |
energy_breakdown | `llenergymeasure.domain.metrics.EnergyBreakdown | None` | None |
multi_gpu | `llenergymeasure.domain.metrics.MultiGPUMetrics | None` | None |
measurement_warnings | list[str] | (required) | Measurement quality warnings (e.g., short duration, thermal drift) |
warmup_excluded_samples | `int | None` | None |
reproducibility_notes | str | 'Energy measured via NVML polling. Accuracy +/-5%. Results may vary with thermal state and system load.' | Fixed disclaimer about measurement accuracy |
timeseries | `str | None` | None |
start_time | datetime.datetime | (required) | Earliest process start time |
end_time | datetime.datetime | (required) | Latest process end time |
process_results | list[llenergymeasure.domain.experiment.RawProcessResult] | (required) | Original per-process results |
aggregation | `llenergymeasure.domain.experiment.AggregationMetadata | None` | None |
thermal_throttle | `llenergymeasure.domain.metrics.ThermalThrottleInfo | None` | None |
warmup_result | `llenergymeasure.domain.metrics.WarmupResult | None` | None |
latency_stats | `llenergymeasure.domain.metrics.LatencyStatistics | None` | None |
extended_metrics | `llenergymeasure.domain.metrics.ExtendedEfficiencyMetrics | None` | None |
class StudyConfig
Thin resolved container for a study (list of experiments + execution config).
Populated by the study loader after sweep expansion. The experiments list contains fully-validated ExperimentConfig objects ready for execution. skipped_configs records any grid points that failed Pydantic validation so they can be displayed to the researcher in pre-flight output.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
experiments | list[llenergymeasure.config.models.ExperimentConfig] | (required) | Resolved list of experiments to run |
study_name | `str | None` | None |
output | llenergymeasure.config.models.OutputConfig | (required) | Study-level output configuration (results_dir, format, save_timeseries) |
study_execution | llenergymeasure.config.models.ExecutionConfig | (required) | Cycle repetition and ordering controls |
runners | `dict[str, str] | None` | None |
images | `dict[str, str] | None` | None |
study_design_hash | `str | None` | None |
skipped_configs | list[dict[str, Any]] | (required) | Grid points that failed Pydantic validation during expansion. Persisted for post-hoc review and pre-flight display. |
dedup_mode | Literal['resolved', 'off'] | 'resolved' | Library-resolution mechanism dedup mode. 'resolved' applies dormant-invariant library resolution at expansion and collapses resolved-config-hash-equivalent configs to a single run. 'off' runs every declared config regardless of equivalence. Set via ExecutionConfig.deduplicate_equivalent / --no-dedup. |
pre_run_equivalence_groups | list[dict[str, Any]] | (required) | Pre-run equivalence groups computed at sweep-expansion time. Each group records the resolved_config_hash, canonical excerpt, and member declared-indices. Written to 'equivalence_groups.json' alongside the results bundle. See sweep-dedup.md §6. |
declared_resolved_config_hashes | list[str] | (required) | Per-declared-config resolved_config_hashes (parallel to the pre-resolved sweep input). Harness consults this to tag each experiment with its equivalence group at sidecar-write time. |
class StudyResult
Final return value of a study run.
Distinct from StudyManifest (the in-progress checkpoint). StudyResult is assembled once after all experiments complete (or after interrupt) and returned to the caller.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
experiments | list[llenergymeasure.domain.experiment.ExperimentResult] | (required) | Results for each experiment in the study |
study_name | `str | None` | None |
study_design_hash | `str | None` | None |
measurement_protocol | dict[str, Any] | (required) | Flat dict from ExecutionConfig: n_cycles, experiment_order, experiment_gap_seconds, cycle_gap_seconds, shuffle_seed, experiment_timeout_seconds |
result_files | list[str] | (required) | Paths to per-experiment result.json files (paths, not embedded) |
summary | llenergymeasure.domain.experiment.StudySummary | (required) | Computed aggregate statistics (counts, totals, warnings) |
skipped_experiments | list[dict[str, Any]] | (required) | Grid points skipped due to validation errors (raw_config + reason + errors) |
run_experiment
run_experiment(config: 'str | Path | ExperimentConfig | None' = None, *, model: 'str | None' = None, engine: 'str | None' = None, n_prompts: 'int' = 100, dataset: 'str' = 'aienergyscore', skip_preflight: 'bool' = False, progress: 'ProgressCallback | None' = None, output_dir: 'str | Path | None' = None, **kwargs: 'Any') -> 'ExperimentResult'
Run a single LLM inference efficiency experiment.
Three call forms: run_experiment("config.yaml") # YAML path run_experiment(ExperimentConfig(...)) # config object run_experiment(model="gpt2", engine="Y") # kwargs convenience
Args:
config: YAML file path, ExperimentConfig object, or None (use kwargs).
model: Model name/path (kwargs form only).
engine: Inference engine (kwargs form only, defaults to ExperimentConfig default).
n_prompts: Number of prompts (kwargs form only, default 100).
dataset: Dataset source name (kwargs form only, default "aienergyscore").
skip_preflight: Skip Docker pre-flight checks (GPU visibility, CUDA/driver compat).
progress: Optional callback for step-by-step progress reporting.
output_dir: Base directory for results. When provided, overrides the
default ./results directory. A timestamped study subdirectory
is created within this path.
**kwargs: Additional ExperimentConfig fields (kwargs form only).
Returns: ExperimentResult: Experiment measurements and metadata.
Raises: ConfigError: Invalid config path, missing model in kwargs form. pydantic.ValidationError: Invalid field values (passes through unchanged).
run_study
run_study(config: 'str | Path | StudyConfig', *, skip_preflight: 'bool' = False, progress: 'ProgressCallback | None' = None, resume_dir: 'Path | None' = None, resume: 'bool' = False, output_dir: 'Path | None' = None, skip_set: 'set[tuple[str, int]] | None' = None, no_lock: 'bool' = False, config_path: 'Path | None' = None, cli_overrides: 'dict[str, Any] | None' = None) -> 'StudyResult'
Run a multi-experiment study.
Always writes manifest.json to disk (documented side-effect).
Args:
config: YAML file path or resolved StudyConfig.
skip_preflight: Skip Docker pre-flight checks (GPU visibility, CUDA/driver compat).
CLI --skip-preflight flag and YAML execution.skip_preflight: true also bypass.
progress: Optional StudyProgressCallback for live per-experiment display.
When provided, the study runner emits begin/end experiment events and
forwards per-step progress from worker subprocesses.
resume_dir: Explicit study directory to resume. Overrides resume.
resume: When True and resume_dir is None, auto-detect the most recent
resumable study in output_dir (default results/).
output_dir: Base output directory used by auto-detect resume. Ignored when
resume_dir is given explicitly.
skip_set: Set of (config_hash, cycle) pairs to skip (already completed in a
previous run). Populated automatically when resuming; callers rarely
need to set this directly.
no_lock: Skip GPU advisory lock acquisition. Use with --no-lock CLI flag.
config_path: Original YAML config file path for copying to study artefacts.
When config is a StudyConfig object, callers should pass the original
path separately so the YAML is preserved for reproducibility.
cli_overrides: Flat dict of CLI flag overrides (e.g. {"model": "gpt2"}).
Used to build per-experiment _resolution.json sidecars showing
which fields were overridden by CLI flags vs YAML vs sweep.
Returns: StudyResult with experiments, result_files, measurement_protocol, and inline summary fields.
Raises: ConfigError: Invalid config path or parse error. PreFlightError: Multi-engine study without Docker. StudyError: No resumable study found (when resume=True). StudyError: Config drift detected (study_design_hash changed). pydantic.ValidationError: Invalid field values (passes through unchanged).