Study and Experiment Configuration

llenergymeasure uses YAML files to define experiments and studies. llem run auto-detects whether a YAML file is an experiment (single run) or a study (sweep or multi-experiment run) by inspecting its top-level keys. Files with a sweep: or experiments: key are loaded as studies; all others are loaded as single experiments.

Single Experiment

The minimal experiment YAML requires only model::

model: gpt2
engine: transformers

A fuller example with sub-configs:

model: gpt2
engine: transformers
dtype: bfloat16
max_input_tokens: 256
max_output_tokens: 256

dataset:
  source: aienergyscore
  n_prompts: 100
  order: interleaved

decoder:
  preset: deterministic   # greedy decoding

transformers:
  batch_size: 4
  attn_implementation: sdpa

warmup:
  enabled: true
  n_warmup: 5

baseline:
  enabled: true
  duration_seconds: 30

energy_sampler: auto
gpu_telemetry: true

A vLLM experiment (requires Docker runner):

model: gpt2
engine: vllm

dataset:
  source: aienergyscore
  n_prompts: 100

runners:
  vllm: docker

vllm:
  engine:
    enforce_eager: false
    gpu_memory_utilization: 0.9
    kv_cache_dtype: auto
  sampling:
    max_tokens: 128

Run either with:

llem run experiment.yaml

Study / Sweep

A study runs multiple experiments with different configurations. There are two ways to define the experiment set:

sweep: — defines a grid of parameter values. Supports two entry types:
- Independent axes (list of scalars) — Cartesian product across all axes.
- Dependent groups (list of dicts) — alternatives within a group, crossed with other groups and axes. Use for parameters with mutual exclusivity or co-dependencies.
experiments: — explicit list of experiment configs. Each entry is merged with any top-level shared fields.

Both can be combined: sweep configs are produced first, then explicit entries are appended.

The study YAML also accepts a base: key pointing to a base experiment config file, and a study_execution: block controlling how many cycles to run and in what order.

Sweep Grammar

Single dimension sweep

# 2 configs: dtype=float16 and dtype=bfloat16
name: dtype-sweep
model: gpt2
engine: transformers

dataset:
  n_prompts: 100

sweep:
  dtype: [float16, bfloat16]

study_execution:
  n_cycles: 3
  experiment_order: shuffle

Run with llem run dtype-sweep.yaml. Produces 2 configs × 3 cycles = 6 runs.

Multi-dimension sweep (Cartesian product)

# 4 configs: fp16+50, fp16+100, bf16+50, bf16+100
name: dtype-n-sweep
model: gpt2
engine: transformers

sweep:
  dtype: [float16, bfloat16]
  dataset.n_prompts: [50, 100]

study_execution:
  n_cycles: 3
  experiment_order: shuffle

Produces 4 configs × 3 cycles = 12 runs.

Engine-scoped sweep (2-segment path)

Use dotted paths (engine.param) to sweep a Engine-specific parameter:

# 4 configs: batch_size 1, 2, 4, 8 — all with engine=transformers
name: batch-size-sweep
model: gpt2
engine: transformers

sweep:
  pytorch.batch_size: [1, 2, 4, 8]

study_execution:
  n_cycles: 3
  experiment_order: shuffle

Produces 4 configs × 3 cycles = 12 runs. The transformers.batch_size path expands to a pytorch: { batch_size: N } section in each generated experiment config.

Engine-scoped sweep (3-segment path)

For nested engine configs (e.g. vLLM's engine sub-section):

# 6 configs: 3 block_sizes × 2 kv_cache_dtypes
name: kv-cache-sweep
model: gpt2
engine: vllm

sweep:
  vllm.engine.block_size: [8, 16, 32]
  vllm.engine.kv_cache_dtype: [auto, fp8]

runners:
  vllm: docker

study_execution:
  n_cycles: 3

Produces 6 configs × 3 cycles = 18 runs. The path vllm.engine.block_size expands to vllm: { engine: { block_size: N } }.

Nested config sweep (dotted path)

Use dotted paths for nested config fields like dataset.n_prompts or dataset.source:

name: dataset-size-sweep
model: gpt2
engine: transformers

sweep:
  dataset.n_prompts: [50, 100, 200]

study_execution:
  n_cycles: 3

Produces 3 configs × 3 cycles = 9 runs. The dotted path dataset.n_prompts expands to dataset: { n_prompts: N } in each generated experiment config.

Note: Dotted paths starting with an engine name (e.g. transformers.batch_size, vllm.engine.max_num_seqs) are treated as engine-scoped parameters. All other dotted paths (e.g. dataset.n_prompts, dataset.order) are treated as nested config fields.

Explicit experiments list

Use experiments: when the configurations are not a regular grid:

# 2 explicit configs
name: compare-engines

dataset:
  n_prompts: 50

experiments:
  - model: gpt2
    engine: transformers
    dtype: bfloat16
  - model: gpt2
    engine: vllm
    runners:
      vllm: docker

study_execution:
  n_cycles: 3
  experiment_order: interleave

Each entry is merged with any top-level shared fields (dataset.n_prompts: 50 here).

Base inheritance

Use base: to load a base experiment config file and sweep on top of it:

# base-experiment.yaml is a normal experiment YAML
name: dtype-sweep
base: base-experiment.yaml

sweep:
  dtype: [float32, float16, bfloat16]

study_execution:
  n_cycles: 3

The base file is loaded, study-only keys (sweep, experiments, study_execution, base, name, runners) are stripped, and the remaining fields become the starting point for all generated configs. Inline fields in the study YAML override base fields. Path is resolved relative to the study YAML file's directory.

Mixed sweep and explicit

Both sweep: and experiments: can appear in the same study. Sweep-generated configs come first, then explicit entries are appended:

# 2 sweep configs + 1 explicit config = 3 total
name: mixed-study
model: gpt2

sweep:
  dtype: [float16, bfloat16]

experiments:
  - model: gpt2
    engine: transformers
    dtype: float32
    transformers:
      load_in_4bit: true

study_execution:
  n_cycles: 3
  experiment_order: shuffle

Dependent groups (sweep groups)

Some parameters have constraints: torch_compile_mode only makes sense when torch_compile: true, quantisation sub-params like bnb_4bit_quant_type require load_in_4bit: true, and beam search requires do_sample: false. A plain Cartesian sweep would produce invalid combinations. Dependent groups solve this by bundling constrained parameters into named groups of alternative variants.

Type-based disambiguation: a list of scalars is an independent axis; a list of dicts is a dependent group. Groups are crossed with each other and with independent axes, but entries within a group are alternatives (unioned, not crossed).

# 6 configs: 2 dtype × 3 compilation variants
name: compile-sweep
model: gpt2
engine: transformers

sweep:
  dtype: [float16, bfloat16]                     # independent axis (2 values)
  pytorch.compilation:                            # dependent group (3 variants)
    - pytorch.torch_compile: false
    - pytorch.torch_compile: true
      pytorch.torch_compile_mode: default
    - pytorch.torch_compile: true
      pytorch.torch_compile_mode: max-autotune

The group name (transformers.compilation) is an abstract label - it doesn't map to a config field. Keys within each variant dict are fully-qualified dotted paths, routed the same way as independent axis keys.

Baseline variant (`{}`)

Use an empty dict {} as a group entry to include a "no override" baseline:

sweep:
  pytorch.quantization:
    - {}                              # baseline: no quantisation
    - pytorch.load_in_8bit: true
    - pytorch.load_in_4bit: true
      pytorch.bnb_4bit_quant_type: nf4

Produces 3 variants: unquantised baseline, 8-bit, and 4-bit.

Mini-grids within group entries

A group entry can contain list-valued fields (list of scalars), which expand as a mini Cartesian product within that entry:

sweep:
  pytorch.caching:
    - {}                              # baseline
    - pytorch.use_cache: true
      pytorch.cache_implementation: [static, offloaded_static, sliding_window]

Produces 4 variants: 1 baseline + 3 cache implementations (all with use_cache: true).

Cross-section overrides

Group entries can override fields outside their engine section, such as decoder settings:

sweep:
  pytorch.decoding:
    - {}                              # baseline: use shared decoder settings
    - decoder.do_sample: false
      decoder.temperature: 0.0
      pytorch.num_beams: 4
      pytorch.early_stopping: true

YAML anchors for repetition reduction

Use YAML anchors (&name) and merge keys (<<: *name) to avoid repeating shared fields across group variants:

sweep:
  tensorrt.quant_config:
    - {}                              # baseline: no quantisation
    - &trt_int8
      tensorrt.quant_config.quant_algo: INT8
    - <<: *trt_int8
      tensorrt.quant_config.kv_cache_quant_algo: INT8
    - tensorrt.quant_config.quant_algo: W4A16_AWQ

Multiple groups crossed

When a sweep has multiple groups, they are crossed with each other (and with independent axes):

# 2 dtype × 2 compilation × 3 quantisation = 12 configs
sweep:
  dtype: [float16, bfloat16]
  pytorch.compilation:
    - pytorch.torch_compile: false
    - pytorch.torch_compile: true
      pytorch.torch_compile_mode: default
  pytorch.quantization:
    - {}
    - pytorch.load_in_8bit: true
    - pytorch.load_in_4bit: true
      pytorch.bnb_4bit_quant_type: nf4

Engine-scoped groups

Groups with an engine-prefixed name (e.g. transformers.compilation, vllm.decoding) only apply to that engine's experiments. Universal groups (no engine prefix) apply to all engines.

Collision rule: A group name must not match an independent axis key. Use abstract names like transformers.compilation (not transformers.torch_compile) to avoid collisions.

Execution Configuration

The study_execution: section controls cycle repetition and ordering:

study_execution:
  n_cycles: 3
  experiment_order: shuffle   # sequential | interleave | shuffle | reverse | latin_square

n_cycles — how many times the full experiment list is repeated. Repeated execution reduces measurement variance.

experiment_order — controls execution order across cycles. For experiments A and B with 3 cycles each:

Order	Sequence	When to use
`sequential`	A, A, A, B, B, B	Thermal isolation between experiments
`interleave`	A, B, A, B, A, B	Reduces temporal bias; fair comparison
`shuffle`	random per-cycle, seeded	Publication-quality; eliminates ordering bias
`reverse`	A, B, B, A, A, B	Detects ordering effects via counterbalancing
`latin_square`	Williams design rows	Balances first-order carryover effects

shuffle order is seeded from the study design hash, so the same study always shuffles identically — reruns are reproducible. Override with an explicit shuffle_seed:

study_execution:
  experiment_order: shuffle
  shuffle_seed: 123  # null = derived from study_design_hash

Note: shuffle_seed (study-level scheduling) and random_seed (per-experiment inference/dataset RNG) are independent by design. Changing one does not affect the other. See Methodology — Seeding model for details.

CLI effective defaults when running llem run study.yaml (if not set in YAML):

n_cycles = 3
experiment_order = shuffle

Override with llem run study.yaml --cycles 5 --order interleave.

Robustness Controls

The study_execution: section also supports circuit breaker and timeout fields:

study_execution:
  max_consecutive_failures: 10      # 0 = disabled, 1 = fail-fast
  circuit_breaker_cooldown_seconds: 60.0
  wall_clock_timeout_hours: 24      # null = no limit

max_consecutive_failures - circuit breaker threshold. After N consecutive experiment failures, the study aborts and marks remaining experiments as skipped. The circuit breaker follows a 3-state pattern (closed/open/half-open): after tripping, it pauses for the cooldown period, then runs one probe experiment. If the probe succeeds, normal execution resumes; if it fails, the study aborts.

0 disables the circuit breaker entirely (equivalent to --no-circuit-breaker)
1 aborts on the first failure with no cooldown (equivalent to --fail-fast)
Default: 10

circuit_breaker_cooldown_seconds - pause duration before the half-open probe experiment. Allows transient issues (GPU thermal throttling, OOM recovery) to resolve before retrying. Default: 60.0.

wall_clock_timeout_hours - hard time limit for the entire study. When the timeout expires, remaining experiments are marked as skipped and the study status is set to timed_out. The manifest preserves all completed results. Default: null (no limit).

CLI flags --fail-fast, --no-circuit-breaker, and --timeout override these settings.

GPU Lock Files

Studies acquire advisory file locks on GPU devices to prevent concurrent studies from competing for the same GPUs. Locks are acquired atomically (all-or-none) and released automatically on process exit, including after SIGKILL. Disable with --no-lock if your environment handles GPU scheduling externally.

Study Resume

Interrupted studies (Ctrl-C, timeout, circuit breaker) can be resumed:

llem run study.yaml --resume           # auto-detect most recent
llem run study.yaml --resume-dir PATH  # specific study directory

Resume skips completed experiments and re-runs failed, skipped, pending, and interrupted ones. Config drift (changed sweep axes or model) raises a hard error to prevent mixing results from different configurations.

Runner Configuration

The runners: section determines how each engine executes:

runners:
  transformers: local                                          # run on host
  vllm: docker                                            # use default image
  vllm: "docker:ghcr.io/custom/vllm:latest"               # explicit image

Value	Behaviour
`local`	Run directly on the host (all dependencies must be installed)
`docker`	Run in a container using the default image for that engine
`docker:<image>`	Run in a container using the specified image

When docker is used without an explicit image tag, the image is resolved from the installed package version using the template ghcr.io/henrycgbaker/llenergymeasure/{engine}:v{version}. For example, with llenergymeasure==0.9.0 and engine=vllm, the image ghcr.io/henrycgbaker/llenergymeasure/vllm:v0.9.0 is pulled automatically.

See Docker Setup for image pull behaviour and pre-fetching.

Config Reference

Configuration Reference

Full reference for all ExperimentConfig fields. All fields except model are optional and have sensible defaults.

Sections:

Top-Level Fields
Dataset (dataset:)
Decoder / Sampling (decoder:)
Warmup (warmup:)
Baseline (baseline:)
Energy Sampler (energy_sampler:)
GPU Telemetry (gpu_telemetry:)
PyTorch Engine (pytorch:)
vLLM Engine (vllm.engine:)
vLLM Sampling (vllm.sampling:)
vLLM Beam Search (vllm.beam_search:)
vLLM Attention (vllm.engine.attention:)
TensorRT-LLM Engine (tensorrt:)

Top-Level Fields

Field	Type	Default	Description
`model`	string	(required)	HuggingFace model ID or local path
`engine`	'pytorch'	'vllm'	'tensorrt'
`dataset`	DatasetConfig	(see below)	Dataset configuration (nested sub-object)
`dtype`	'float32'	'float16'	'bfloat16'
`random_seed`	integer	`42`	Per-experiment seed: inference RNG and dataset ordering
`max_input_tokens`	integer	None	`256`
`max_output_tokens`	integer	None	`256`
`decoder`	DecoderConfig	(see section)	Universal decoder/generation configuration
`warmup`	WarmupConfig	(see section)	Warmup phase configuration
`baseline`	BaselineConfig	(see section)	Baseline power measurement configuration
`energy_sampler`	'auto'	'nvml'	'zeus'
`gpu_telemetry`	boolean	`true`	Persist GPU power/thermal/memory timeseries to Parquet sidecar. NVML always runs for throttle detection; this controls disk output.
`pytorch`	TransformersConfig	None	`null`
`vllm`	VLLMConfig	None	`null`
`tensorrt`	TensorRTConfig	None	`null`
`lora`	LoRAConfig	None	`null`
`passthrough_kwargs`	dict	None	`null`
`output_dir`	string	None	`null`

Dataset (`dataset:`)

The dataset: section configures which prompts to use and how they are loaded.

Field	Type	Default	Description
`source`	string	`aienergyscore`	Dataset source: built-in alias (e.g. `aienergyscore`) or `.jsonl` file path
`n_prompts`	integer	`100`	Number of prompts to load
`order`	'interleaved'	'grouped'	'shuffled'

Examples:

# Built-in dataset (default)
dataset:
  source: aienergyscore
  n_prompts: 100

# Custom JSONL file
dataset:
  source: ./my-prompts.jsonl
  n_prompts: 500
  order: shuffled

Decoder / Sampling (`decoder:`)

Field	Type	Default	Description
`temperature`	number	`1.0`	Sampling temperature (0=greedy)
`do_sample`	boolean	`true`	Enable sampling (ignored if temp=0)
`top_k`	integer	`50`	Top-k sampling (0=disabled)
`top_p`	number	`1.0`	Top-p nucleus sampling (1.0=disabled)
`repetition_penalty`	number	`1.0`	Repetition penalty (1.0=no penalty)
`min_p`	number	None	`null`
`min_new_tokens`	integer	None	`null`
`preset`	'deterministic'	'standard'	'creative'

Warmup (`warmup:`)

Two modes: fixed (default) runs exactly n_warmup prompts; CV convergence (opt-in via convergence_detection: true) runs until latency CV drops below threshold. CV mode replaces n_warmup - they are alternative modes, not additive. After warmup, thermal_floor_seconds wait lets GPU temperature plateau before measurement.

Field	Type	Default	Description
`enabled`	boolean	`true`	Enable warmup phase
`n_warmup`	integer	`5`	Number of warmup prompts in fixed mode (ignored when convergence_detection=true)
`thermal_floor_seconds`	number	`60.0`	Post-warmup thermal stabilisation wait in seconds. Minimum 30s enforced.
`convergence_detection`	boolean	`false`	Enable CV-based adaptive warmup (replaces fixed n_warmup)
`cv_threshold`	number	`0.05`	CV target for convergence (stop when CV < this value)
`max_prompts`	integer	`20`	Safety cap on warmup prompts in CV mode
`window_size`	integer	`3`	Sliding window size for CV calculation
`min_prompts`	integer	`5`	Minimum prompts before checking convergence

Baseline (`baseline:`)

Field	Type	Default	Description
`enabled`	boolean	`true`	Enable baseline power measurement
`duration_seconds`	number	`30.0`	Baseline measurement duration in seconds (5-120s)
`strategy`	string	`"validated"`	Caching strategy: `validated` (cached with periodic spot-check, default), `cached` (disk-persisted TTL), `fresh` (measure every experiment)
`cache_ttl_seconds`	number	`7200.0`	How long a cached baseline remains valid before re-measurement, in seconds. Min 60s. Used with `cached`/`validated` strategies.
`validation_interval`	integer	`5`	Re-validate baseline every N experiments. Used with `validated` strategy only.
`drift_threshold`	number	`0.10`	Power drift fraction (0.01-0.50) to trigger re-measurement. Used with `validated` strategy only.

Energy Sampler (`energy_sampler:`)

energy_sampler is a flat top-level field (not a nested section). See Energy Measurement for full details on backends, accuracy, and what the harness resolves internally.

Value	Description
`auto` (default)	Best available: Zeus > NVML > CodeCarbon
`nvml`	NVML power polling at 100ms intervals
`zeus`	Hardware energy counters (Volta+ GPUs). Most accurate. Install: `pip install "llenergymeasure[zeus]"`
`codecarbon`	System-level (GPU+CPU+RAM). Install: `pip install "llenergymeasure[codecarbon]"`
`null`	Disable energy measurement (throughput-only mode)

GPU Telemetry (`gpu_telemetry:`)

gpu_telemetry controls whether the NVML power/thermal/memory timeseries is persisted to a Parquet sidecar file alongside the result JSON.

gpu_telemetry: true    # default: write timeseries.parquet
gpu_telemetry: false   # skip parquet output (saves disk for large studies)

What it controls: Whether timeseries.parquet is written to the output directory.

What it does not control: NVML telemetry is always collected during inference for throttle detection and measurement quality warnings, regardless of this setting. Setting gpu_telemetry: false only suppresses disk output.

The Parquet sidecar contains 1Hz downsampled data with 8 columns: timestamp_s, gpu_index, power_w, temperature_c, memory_used_mb, memory_total_mb, sm_utilisation_pct, throttle_reasons. File sizes are typically < 5KB per minute of inference per GPU.

See Energy Measurement for details on how NVML telemetry relates to energy measurement.

Transformers Engine (`pytorch:`)

Field	Type	Default	Description
`batch_size`	integer	None	`null`
`attn_implementation`	'sdpa'	'flash_attention_2'	'flash_attention_3'
`torch_compile`	boolean	None	`null`
`torch_compile_mode`	string	None	`null`
`torch_compile_backend`	string	None	`null`
`load_in_4bit`	boolean	None	`null`
`load_in_8bit`	boolean	None	`null`
`bnb_4bit_compute_dtype`	'float16'	'bfloat16'	'float32'
`bnb_4bit_quant_type`	'nf4'	'fp4'	None
`bnb_4bit_use_double_quant`	boolean	None	`null`
`use_cache`	boolean	None	`null`
`cache_implementation`	'static'	'offloaded_static'	'sliding_window'
`num_beams`	integer	None	`null`
`early_stopping`	boolean	None	`null`
`length_penalty`	number	None	`null`
`no_repeat_ngram_size`	integer	None	`null`
`prompt_lookup_num_tokens`	integer	None	`null`
`device_map`	string	None	`null`
`max_memory`	dict	None	`null`
`low_cpu_mem_usage`	boolean	None	`null`
`allow_tf32`	boolean	None	`null`
`autocast_enabled`	boolean	None	`null`
`autocast_dtype`	'float16'	'bfloat16'	None
`tp_plan`	string	None	`null`
`tp_size`	integer	None	`null`

vLLM Engine (`vllm.engine:`)

Field	Type	Default	Description
`gpu_memory_utilization`	number	None	`null`
`swap_space`	number	None	`null`
`cpu_offload_gb`	number	None	`null`
`block_size`	8	16	32
`kv_cache_dtype`	'auto'	'fp8'	'fp8_e5m2'
`enforce_eager`	boolean	None	`null`
`enable_chunked_prefill`	boolean	None	`null`
`max_num_seqs`	integer	None	`null`
`max_num_batched_tokens`	integer	None	`null`
`max_model_len`	integer	None	`null`
`tensor_parallel_size`	integer	None	`null`
`pipeline_parallel_size`	integer	None	`null`
`enable_prefix_caching`	boolean	None	`null`
`quantization`	'awq'	'gptq'	'fp8'
`num_scheduler_steps`	integer	None	`null`
`max_seq_len_to_capture`	integer	None	`null`
`distributed_executor_backend`	'mp'	'ray'	None
`speculative_config`	VLLMSpeculativeConfig	None	`null`
`offload_group_size`	integer	None	`null`
`offload_num_in_group`	integer	None	`null`
`offload_prefetch_step`	integer	None	`null`
`offload_params`	list[string]	None	`null`
`disable_custom_all_reduce`	boolean	None	`null`
`kv_cache_memory_bytes`	integer	None	`null`
`compilation_config`	dict	None	`null`
`attention`	VLLMAttentionConfig	None	`null`

vLLM Sampling (`vllm.sampling:`)

Field	Type	Default	Description
`max_tokens`	integer	None	`null`
`min_tokens`	integer	None	`null`
`presence_penalty`	number	None	`null`
`frequency_penalty`	number	None	`null`
`ignore_eos`	boolean	None	`null`
`n`	integer	None	`null`

vLLM Beam Search (`vllm.beam_search:`)

Field	Type	Default	Description
`beam_width`	integer	None	`null`
`length_penalty`	number	None	`null`
`early_stopping`	boolean	None	`null`
`max_tokens`	integer	None	`null`

vLLM Attention (`vllm.engine.attention:`)

Field	Type	Default	Description
`engine`	string	None	`null`
`flash_attn_version`	integer	None	`null`
`flash_attn_max_num_splits_for_cuda_graph`	integer	None	`null`
`use_prefill_decode_attention`	boolean	None	`null`
`use_prefill_query_quantization`	boolean	None	`null`
`use_cudnn_prefill`	boolean	None	`null`
`disable_flashinfer_prefill`	boolean	None	`null`
`disable_flashinfer_q_quantization`	boolean	None	`null`
`use_trtllm_attention`	boolean	None	`null`
`use_trtllm_ragged_deepseek_prefill`	boolean	None	`null`

TensorRT-LLM Engine (`tensorrt:`)

Field	Type	Default	Description
`max_batch_size`	integer	None	`null`
`tensor_parallel_size`	integer	None	`null`
`pipeline_parallel_size`	integer	None	`null`
`max_input_len`	integer	None	`null`
`max_seq_len`	integer	None	`null`
`max_num_tokens`	integer	None	`null`
`dtype`	'float16'	'bfloat16'	None
`fast_build`	boolean	None	`null`
`engine`	string	None	`null`
`engine_path`	string	None	`null`
`quant`	TensorRTQuantConfig	None	`null`
`kv_cache`	TensorRTKvCacheConfig	None	`null`
`scheduler`	TensorRTSchedulerConfig	None	`null`
`sampling`	TensorRTSamplingConfig	None	`null`

User Config File

llenergymeasure reads per-user defaults from ~/.config/llenergymeasure/config.yaml (XDG base directory, detected via platformdirs). This file is optional — all settings have sensible defaults.

# ~/.config/llenergymeasure/config.yaml
output:
  results_dir: ./results
  model_cache_dir: ~/.cache/huggingface

runners:
  transformers: local
  vllm: docker         # always use Docker for vLLM
  tensorrt: docker     # TensorRT-LLM requires Docker

measurement:
  datacenter_pue: 1.0
  carbon_intensity_gco2_kwh: 0.233

Run llem config to display the current effective configuration and check which engines are installed. Use llem config --verbose for detailed environment information.

Single Experiment​

Study / Sweep​

Sweep Grammar​

Single dimension sweep​

Multi-dimension sweep (Cartesian product)​

Engine-scoped sweep (2-segment path)​

Engine-scoped sweep (3-segment path)​

Nested config sweep (dotted path)​

Explicit experiments list​

Base inheritance​

Mixed sweep and explicit​

Dependent groups (sweep groups)​

Baseline variant ({})​

Mini-grids within group entries​

Cross-section overrides​

YAML anchors for repetition reduction​

Multiple groups crossed​

Engine-scoped groups​

Execution Configuration​

Robustness Controls​

GPU Lock Files​

Study Resume​

Runner Configuration​

Config Reference​

Configuration Reference​

Top-Level Fields​

Dataset (dataset:)​

Decoder / Sampling (decoder:)​

Warmup (warmup:)​

Baseline (baseline:)​

Energy Sampler (energy_sampler:)​

GPU Telemetry (gpu_telemetry:)​

Transformers Engine (pytorch:)​

vLLM Engine (vllm.engine:)​

vLLM Sampling (vllm.sampling:)​

vLLM Beam Search (vllm.beam_search:)​

vLLM Attention (vllm.engine.attention:)​

TensorRT-LLM Engine (tensorrt:)​

User Config File​