Skip to main content

Python API reference

⚠️ Pre-1.0 — Python API surface is unstable. LLenergyMeasure is currently pre-1.0. The Python library API documented here is not yet a stable public surface — class signatures, function names, and module paths may change between minor versions without notice.

The supported user-facing interfaces are the CLI (llem run, llem config) and the YAML study config. See CLI reference and Study config for stable contracts.

The library API will stabilise at v1.0.0.

Auto-generated by scripts/generate_api_docs.py from docstrings in src/llenergymeasure/. Do not edit manually — edits are overwritten on the next build.

__version__

Current version: 0.9.0


class ExperimentConfig

v2.0 experiment configuration.

Central configuration object controlling all aspects of a single LLM inference efficiency measurement. Organised into semantic groups:

  • task: What to measure (model, dataset, token limits, seed)
  • measurement: How to measure (warmup, baseline, energy sampler)
  • Engine sections (transformers:, vllm:, tensorrt:): How to execute

The engine section must match the engine field. Providing a transformers: section when engine=vllm is a configuration error.

Fields

FieldTypeDefaultDescription
taskllenergymeasure.config.models.TaskConfig(required)Task configuration: model, dataset, workload shape
engineEngine<Engine.TRANSFORMERS: 'transformers'>Inference engine
measurementllenergymeasure.config.models.MeasurementConfig(required)Measurement methodology: warmup, baseline, energy sampling
sampling_presetOptional[Literal['deterministic', 'standard', 'creative', 'factual']]NoneSampling preset. When set, preset values are merged into the active engine's sampling section at parse time; explicit YAML values take precedence over preset values.
transformers`llenergymeasure.config.engine_configs.TransformersConfigNone`None
vllm`llenergymeasure.config.engine_configs.VLLMConfigNone`None
tensorrt`llenergymeasure.config.engine_configs.TensorRTConfigNone`None
lora`llenergymeasure.config.models.LoRAConfigNone`None
passthrough_kwargs`dict[str, Any]None`None

class ExperimentResult

Experiment result — the user-visible output of a measurement run.

Combines raw results from all processes into a single result with proper aggregation (sum energy, average throughput). For single-GPU experiments, process_results has exactly one item.

v2.0 schema: all fields ship together (decision #50).

Fields

FieldTypeDefaultDescription
schema_versionstr'3.0'Result schema version
experiment_idstr(required)Unique experiment identifier
measurement_config_hashstr(required)SHA-256[:16] of ExperimentConfig (environment excluded)
llenergymeasure_version`strNone`None
enginestr'transformers'Inference engine used
engine_version`strNone`None
model_namestr'unknown'Model name/path used
measurement_methodologyLiteral['total', 'steady_state', 'windowed'](required)What was measured — total run, steady-state window, or explicit window
steady_state_window`tuple[float, float]None`None
total_tokensint(required)Total tokens across all processes
total_energy_jfloat(required)Total energy (sum across processes)
total_inference_time_secfloat(required)Total inference time
avg_tokens_per_secondfloat(required)Average throughput
avg_energy_per_token_jfloat(required)Average energy per token
mj_per_tok_adjusted`floatNone`None
mj_per_tok_total`floatNone`None
total_flopsfloat(required)Total FLOPs (reference metadata)
flops_per_output_token`floatNone`None
flops_per_input_token`floatNone`None
flops_per_second`floatNone`None
baseline_power_w`floatNone`None
energy_adjusted_j`floatNone`None
energy_per_device_j`list[float]None`None
energy_breakdown`llenergymeasure.domain.metrics.EnergyBreakdownNone`None
multi_gpu`llenergymeasure.domain.metrics.MultiGPUMetricsNone`None
measurement_warningslist[str](required)Measurement quality warnings (e.g., short duration, thermal drift)
warmup_excluded_samples`intNone`None
reproducibility_notesstr'Energy measured via NVML polling. Accuracy +/-5%. Results may vary with thermal state and system load.'Fixed disclaimer about measurement accuracy
timeseries`strNone`None
start_timedatetime.datetime(required)Earliest process start time
end_timedatetime.datetime(required)Latest process end time
process_resultslist[llenergymeasure.domain.experiment.RawProcessResult](required)Original per-process results
aggregation`llenergymeasure.domain.experiment.AggregationMetadataNone`None
thermal_throttle`llenergymeasure.domain.metrics.ThermalThrottleInfoNone`None
warmup_result`llenergymeasure.domain.metrics.WarmupResultNone`None
latency_stats`llenergymeasure.domain.metrics.LatencyStatisticsNone`None
extended_metrics`llenergymeasure.domain.metrics.ExtendedEfficiencyMetricsNone`None

class StudyConfig

Thin resolved container for a study (list of experiments + execution config).

Populated by the study loader after sweep expansion. The experiments list contains fully-validated ExperimentConfig objects ready for execution. skipped_configs records any grid points that failed Pydantic validation so they can be displayed to the researcher in pre-flight output.

Fields

FieldTypeDefaultDescription
experimentslist[llenergymeasure.config.models.ExperimentConfig](required)Resolved list of experiments to run
study_name`strNone`None
outputllenergymeasure.config.models.OutputConfig(required)Study-level output configuration (results_dir, format, save_timeseries)
study_executionllenergymeasure.config.models.ExecutionConfig(required)Cycle repetition and ordering controls
runners`dict[str, str]None`None
images`dict[str, str]None`None
study_design_hash`strNone`None
skipped_configslist[dict[str, Any]](required)Grid points that failed Pydantic validation during expansion. Persisted for post-hoc review and pre-flight display.
dedup_modeLiteral['resolved', 'off']'resolved'Library-resolution mechanism dedup mode. 'resolved' applies dormant-invariant library resolution at expansion and collapses resolved-config-hash-equivalent configs to a single run. 'off' runs every declared config regardless of equivalence. Set via ExecutionConfig.deduplicate_equivalent / --no-dedup.
pre_run_equivalence_groupslist[dict[str, Any]](required)Pre-run equivalence groups computed at sweep-expansion time. Each group records the resolved_config_hash, canonical excerpt, and member declared-indices. Written to 'equivalence_groups.json' alongside the results bundle. See sweep-dedup.md §6.
declared_resolved_config_hasheslist[str](required)Per-declared-config resolved_config_hashes (parallel to the pre-resolved sweep input). Harness consults this to tag each experiment with its equivalence group at sidecar-write time.

class StudyResult

Final return value of a study run.

Distinct from StudyManifest (the in-progress checkpoint). StudyResult is assembled once after all experiments complete (or after interrupt) and returned to the caller.

Fields

FieldTypeDefaultDescription
experimentslist[llenergymeasure.domain.experiment.ExperimentResult](required)Results for each experiment in the study
study_name`strNone`None
study_design_hash`strNone`None
measurement_protocoldict[str, Any](required)Flat dict from ExecutionConfig: n_cycles, experiment_order, experiment_gap_seconds, cycle_gap_seconds, shuffle_seed, experiment_timeout_seconds
result_fileslist[str](required)Paths to per-experiment result.json files (paths, not embedded)
summaryllenergymeasure.domain.experiment.StudySummary(required)Computed aggregate statistics (counts, totals, warnings)
skipped_experimentslist[dict[str, Any]](required)Grid points skipped due to validation errors (raw_config + reason + errors)

run_experiment

run_experiment(config: 'str | Path | ExperimentConfig | None' = None, *, model: 'str | None' = None, engine: 'str | None' = None, n_prompts: 'int' = 100, dataset: 'str' = 'aienergyscore', skip_preflight: 'bool' = False, progress: 'ProgressCallback | None' = None, output_dir: 'str | Path | None' = None, **kwargs: 'Any') -> 'ExperimentResult'

Run a single LLM inference efficiency experiment.

Three call forms: run_experiment("config.yaml") # YAML path run_experiment(ExperimentConfig(...)) # config object run_experiment(model="gpt2", engine="Y") # kwargs convenience

Args: config: YAML file path, ExperimentConfig object, or None (use kwargs). model: Model name/path (kwargs form only). engine: Inference engine (kwargs form only, defaults to ExperimentConfig default). n_prompts: Number of prompts (kwargs form only, default 100). dataset: Dataset source name (kwargs form only, default "aienergyscore"). skip_preflight: Skip Docker pre-flight checks (GPU visibility, CUDA/driver compat). progress: Optional callback for step-by-step progress reporting. output_dir: Base directory for results. When provided, overrides the default ./results directory. A timestamped study subdirectory is created within this path. **kwargs: Additional ExperimentConfig fields (kwargs form only).

Returns: ExperimentResult: Experiment measurements and metadata.

Raises: ConfigError: Invalid config path, missing model in kwargs form. pydantic.ValidationError: Invalid field values (passes through unchanged).


run_study

run_study(config: 'str | Path | StudyConfig', *, skip_preflight: 'bool' = False, progress: 'ProgressCallback | None' = None, resume_dir: 'Path | None' = None, resume: 'bool' = False, output_dir: 'Path | None' = None, skip_set: 'set[tuple[str, int]] | None' = None, no_lock: 'bool' = False, config_path: 'Path | None' = None, cli_overrides: 'dict[str, Any] | None' = None) -> 'StudyResult'

Run a multi-experiment study.

Always writes manifest.json to disk (documented side-effect).

Args: config: YAML file path or resolved StudyConfig. skip_preflight: Skip Docker pre-flight checks (GPU visibility, CUDA/driver compat). CLI --skip-preflight flag and YAML execution.skip_preflight: true also bypass. progress: Optional StudyProgressCallback for live per-experiment display. When provided, the study runner emits begin/end experiment events and forwards per-step progress from worker subprocesses. resume_dir: Explicit study directory to resume. Overrides resume. resume: When True and resume_dir is None, auto-detect the most recent resumable study in output_dir (default results/). output_dir: Base output directory used by auto-detect resume. Ignored when resume_dir is given explicitly. skip_set: Set of (config_hash, cycle) pairs to skip (already completed in a previous run). Populated automatically when resuming; callers rarely need to set this directly. no_lock: Skip GPU advisory lock acquisition. Use with --no-lock CLI flag. config_path: Original YAML config file path for copying to study artefacts. When config is a StudyConfig object, callers should pass the original path separately so the YAML is preserved for reproducibility. cli_overrides: Flat dict of CLI flag overrides (e.g. {"model": "gpt2"}). Used to build per-experiment _resolution.json sidecars showing which fields were overridden by CLI flags vs YAML vs sweep.

Returns: StudyResult with experiments, result_files, measurement_protocol, and inline summary fields.

Raises: ConfigError: Invalid config path or parse error. PreFlightError: Multi-engine study without Docker. StudyError: No resumable study found (when resume=True). StudyError: Config drift detected (study_design_hash changed). pydantic.ValidationError: Invalid field values (passes through unchanged).