Python API reference

⚠️ Pre-1.0 — Python API surface is unstable. LLenergyMeasure is currently pre-1.0. The Python library API documented here is not yet a stable public surface — class signatures, function names, and module paths may change between minor versions without notice.

The supported user-facing interfaces are the CLI (llem run, llem config) and the YAML study config. See CLI reference and Study config for stable contracts.

The library API will stabilise at v1.0.0.

Auto-generated by scripts/generate_api_docs.py from docstrings in src/llenergymeasure/. Do not edit manually — edits are overwritten on the next build.

`version`

Current version: 0.9.0

`class ExperimentConfig`

v2.0 experiment configuration.

Central configuration object controlling all aspects of a single LLM inference efficiency measurement. Organised into semantic groups:

task: What to measure (model, dataset, token limits, seed)
measurement: How to measure (warmup, baseline, energy sampler)
Engine sections (transformers:, vllm:, tensorrt:): How to execute

The engine section must match the engine field. Providing a transformers: section when engine=vllm is a configuration error.

Fields

Field	Type	Default	Description
`task`	`llenergymeasure.config.models.TaskConfig`	(required)	Task configuration: model, dataset, workload shape
`engine`	`Engine`	`<Engine.TRANSFORMERS: 'transformers'>`	Inference engine
`measurement`	`llenergymeasure.config.models.MeasurementConfig`	(required)	Measurement methodology: warmup, baseline, energy sampling
`sampling_preset`	`Optional[Literal['deterministic', 'standard', 'creative', 'factual']]`	`None`	Sampling preset. When set, preset values are merged into the active engine's sampling section at parse time; explicit YAML values take precedence over preset values.
`transformers`	`llenergymeasure.config.engine_configs.TransformersConfig	None`	`None`
`vllm`	`llenergymeasure.config.engine_configs.VLLMConfig	None`	`None`
`tensorrt`	`llenergymeasure.config.engine_configs.TensorRTConfig	None`	`None`
`lora`	`llenergymeasure.config.models.LoRAConfig	None`	`None`
`passthrough_kwargs`	`dict[str, Any]	None`	`None`

`class ExperimentResult`

Experiment result — the user-visible output of a measurement run.

Combines raw results from all processes into a single result with proper aggregation (sum energy, average throughput). For single-GPU experiments, process_results has exactly one item.

v2.0 schema: all fields ship together (decision #50).

Fields

Field	Type	Default	Description
`schema_version`	`str`	`'3.0'`	Result schema version
`experiment_id`	`str`	(required)	Unique experiment identifier
`measurement_config_hash`	`str`	(required)	SHA-256[:16] of ExperimentConfig (environment excluded)
`llenergymeasure_version`	`str	None`	`None`
`engine`	`str`	`'transformers'`	Inference engine used
`engine_version`	`str	None`	`None`
`model_name`	`str`	`'unknown'`	Model name/path used
`measurement_methodology`	`Literal['total', 'steady_state', 'windowed']`	(required)	What was measured — total run, steady-state window, or explicit window
`steady_state_window`	`tuple[float, float]	None`	`None`
`total_tokens`	`int`	(required)	Total tokens across all processes
`total_energy_j`	`float`	(required)	Total energy (sum across processes)
`total_inference_time_sec`	`float`	(required)	Total inference time
`avg_tokens_per_second`	`float`	(required)	Average throughput
`avg_energy_per_token_j`	`float`	(required)	Average energy per token
`mj_per_tok_adjusted`	`float	None`	`None`
`mj_per_tok_total`	`float	None`	`None`
`total_flops`	`float`	(required)	Total FLOPs (reference metadata)
`flops_per_output_token`	`float	None`	`None`
`flops_per_input_token`	`float	None`	`None`
`flops_per_second`	`float	None`	`None`
`baseline_power_w`	`float	None`	`None`
`energy_adjusted_j`	`float	None`	`None`
`energy_per_device_j`	`list[float]	None`	`None`
`energy_breakdown`	`llenergymeasure.domain.metrics.EnergyBreakdown	None`	`None`
`multi_gpu`	`llenergymeasure.domain.metrics.MultiGPUMetrics	None`	`None`
`measurement_warnings`	`list[str]`	(required)	Measurement quality warnings (e.g., short duration, thermal drift)
`warmup_excluded_samples`	`int	None`	`None`
`reproducibility_notes`	`str`	`'Energy measured via NVML polling. Accuracy +/-5%. Results may vary with thermal state and system load.'`	Fixed disclaimer about measurement accuracy
`timeseries`	`str	None`	`None`
`start_time`	`datetime.datetime`	(required)	Earliest process start time
`end_time`	`datetime.datetime`	(required)	Latest process end time
`process_results`	`list[llenergymeasure.domain.experiment.RawProcessResult]`	(required)	Original per-process results
`aggregation`	`llenergymeasure.domain.experiment.AggregationMetadata	None`	`None`
`thermal_throttle`	`llenergymeasure.domain.metrics.ThermalThrottleInfo	None`	`None`
`warmup_result`	`llenergymeasure.domain.metrics.WarmupResult	None`	`None`
`latency_stats`	`llenergymeasure.domain.metrics.LatencyStatistics	None`	`None`
`extended_metrics`	`llenergymeasure.domain.metrics.ExtendedEfficiencyMetrics	None`	`None`

`class StudyConfig`

Thin resolved container for a study (list of experiments + execution config).

Populated by the study loader after sweep expansion. The experiments list contains fully-validated ExperimentConfig objects ready for execution. skipped_configs records any grid points that failed Pydantic validation so they can be displayed to the researcher in pre-flight output.

Fields

Field	Type	Default	Description
`experiments`	`list[llenergymeasure.config.models.ExperimentConfig]`	(required)	Resolved list of experiments to run
`study_name`	`str	None`	`None`
`output`	`llenergymeasure.config.models.OutputConfig`	(required)	Study-level output configuration (results_dir, format, save_timeseries)
`study_execution`	`llenergymeasure.config.models.ExecutionConfig`	(required)	Cycle repetition and ordering controls
`runners`	`dict[str, str]	None`	`None`
`images`	`dict[str, str]	None`	`None`
`study_design_hash`	`str	None`	`None`
`skipped_configs`	`list[dict[str, Any]]`	(required)	Grid points that failed Pydantic validation during expansion. Persisted for post-hoc review and pre-flight display.
`dedup_mode`	`Literal['resolved', 'off']`	`'resolved'`	Library-resolution mechanism dedup mode. 'resolved' applies dormant-invariant library resolution at expansion and collapses resolved-config-hash-equivalent configs to a single run. 'off' runs every declared config regardless of equivalence. Set via ExecutionConfig.deduplicate_equivalent / --no-dedup.
`pre_run_equivalence_groups`	`list[dict[str, Any]]`	(required)	Pre-run equivalence groups computed at sweep-expansion time. Each group records the resolved_config_hash, canonical excerpt, and member declared-indices. Written to 'equivalence_groups.json' alongside the results bundle. See sweep-dedup.md §6.
`declared_resolved_config_hashes`	`list[str]`	(required)	Per-declared-config resolved_config_hashes (parallel to the pre-resolved sweep input). Harness consults this to tag each experiment with its equivalence group at sidecar-write time.

`class StudyResult`

Final return value of a study run.

Distinct from StudyManifest (the in-progress checkpoint). StudyResult is assembled once after all experiments complete (or after interrupt) and returned to the caller.

Fields

Field	Type	Default	Description
`experiments`	`list[llenergymeasure.domain.experiment.ExperimentResult]`	(required)	Results for each experiment in the study
`study_name`	`str	None`	`None`
`study_design_hash`	`str	None`	`None`
`measurement_protocol`	`dict[str, Any]`	(required)	Flat dict from ExecutionConfig: n_cycles, experiment_order, experiment_gap_seconds, cycle_gap_seconds, shuffle_seed, experiment_timeout_seconds
`result_files`	`list[str]`	(required)	Paths to per-experiment result.json files (paths, not embedded)
`summary`	`llenergymeasure.domain.experiment.StudySummary`	(required)	Computed aggregate statistics (counts, totals, warnings)
`skipped_experiments`	`list[dict[str, Any]]`	(required)	Grid points skipped due to validation errors (raw_config + reason + errors)

`run_experiment`

run_experiment(config: 'str | Path | ExperimentConfig | None' = None, *, model: 'str | None' = None, engine: 'str | None' = None, n_prompts: 'int' = 100, dataset: 'str' = 'aienergyscore', skip_preflight: 'bool' = False, progress: 'ProgressCallback | None' = None, output_dir: 'str | Path | None' = None, **kwargs: 'Any') -> 'ExperimentResult'

Run a single LLM inference efficiency experiment.

Three call forms: run_experiment("config.yaml") # YAML path run_experiment(ExperimentConfig(...)) # config object run_experiment(model="gpt2", engine="Y") # kwargs convenience

Args: config: YAML file path, ExperimentConfig object, or None (use kwargs). model: Model name/path (kwargs form only). engine: Inference engine (kwargs form only, defaults to ExperimentConfig default). n_prompts: Number of prompts (kwargs form only, default 100). dataset: Dataset source name (kwargs form only, default "aienergyscore"). skip_preflight: Skip Docker pre-flight checks (GPU visibility, CUDA/driver compat). progress: Optional callback for step-by-step progress reporting. output_dir: Base directory for results. When provided, overrides the default ./results directory. A timestamped study subdirectory is created within this path. **kwargs: Additional ExperimentConfig fields (kwargs form only).

Returns: ExperimentResult: Experiment measurements and metadata.

Raises: ConfigError: Invalid config path, missing model in kwargs form. pydantic.ValidationError: Invalid field values (passes through unchanged).

`run_study`

run_study(config: 'str | Path | StudyConfig', *, skip_preflight: 'bool' = False, progress: 'ProgressCallback | None' = None, resume_dir: 'Path | None' = None, resume: 'bool' = False, output_dir: 'Path | None' = None, skip_set: 'set[tuple[str, int]] | None' = None, no_lock: 'bool' = False, config_path: 'Path | None' = None, cli_overrides: 'dict[str, Any] | None' = None) -> 'StudyResult'

Run a multi-experiment study.

Always writes manifest.json to disk (documented side-effect).

Args: config: YAML file path or resolved StudyConfig. skip_preflight: Skip Docker pre-flight checks (GPU visibility, CUDA/driver compat). CLI --skip-preflight flag and YAML execution.skip_preflight: true also bypass. progress: Optional StudyProgressCallback for live per-experiment display. When provided, the study runner emits begin/end experiment events and forwards per-step progress from worker subprocesses. resume_dir: Explicit study directory to resume. Overrides resume. resume: When True and resume_dir is None, auto-detect the most recent resumable study in output_dir (default results/). output_dir: Base output directory used by auto-detect resume. Ignored when resume_dir is given explicitly. skip_set: Set of (config_hash, cycle) pairs to skip (already completed in a previous run). Populated automatically when resuming; callers rarely need to set this directly. no_lock: Skip GPU advisory lock acquisition. Use with --no-lock CLI flag. config_path: Original YAML config file path for copying to study artefacts. When config is a StudyConfig object, callers should pass the original path separately so the YAML is preserved for reproducibility. cli_overrides: Flat dict of CLI flag overrides (e.g. {"model": "gpt2"}). Used to build per-experiment _resolution.json sidecars showing which fields were overridden by CLI flags vs YAML vs sweep.

Returns: StudyResult with experiments, result_files, measurement_protocol, and inline summary fields.

Raises: ConfigError: Invalid config path or parse error. PreFlightError: Multi-engine study without Docker. StudyError: No resumable study found (when resume=True). StudyError: Config drift detected (study_design_hash changed). pydantic.ValidationError: Invalid field values (passes through unchanged).

__version__​

class ExperimentConfig​

Fields​

class ExperimentResult​

Fields​

class StudyConfig​

Fields​

class StudyResult​

Fields​

run_experiment​

run_study​

`version`

`class ExperimentConfig`

Fields

`class ExperimentResult`

Fields

`class StudyConfig`

Fields

`class StudyResult`

Fields

`run_experiment`

`run_study`