Skip to main content

vLLM Engine Schema

Engine version: 0.7.3
Discovered at: 2026-05-06T22:57:22+02:00
Discovery method: dataclasses.fields(EngineArgs) + msgspec.json.schema(SamplingParams)
Schema version: 1.0.0

Summary: 104 engine parameters, 31 sampling parameters.

Discovery limitations

  • sampling_params — constraints (e.g. temperature>=0, top_p in (0,1]) live in imperative _verify_args() and are not introspectable from field metadata
  • engine_params — per-field descriptions unavailable (vLLM EngineArgs has only a class docstring)

Engine Parameters

FieldTypeDefaultDescription
modelstrfacebook/opt-125m
served_model_name`strlist[str]None`
tokenizer`strNone`
taskLiteral['auto', 'generate', 'embedding', 'embed', 'classify', 'score', 'reward', 'transcription']auto
skip_tokenizer_initboolfalse
tokenizer_modestrauto
trust_remote_codeboolfalse
allowed_local_media_pathstr""
download_dir`strNone`
load_formatstrauto
config_formatConfigFormatauto
dtypestrauto
kv_cache_dtypestrauto
seedint0
max_model_len`intNone`
distributed_executor_backend`strtype[ExecutorBase]None`
pipeline_parallel_sizeint1
tensor_parallel_sizeint1
max_parallel_loading_workers`intNone`
block_size`intNone`
enable_prefix_caching`boolNone`
disable_sliding_windowboolfalse
use_v2_block_managerbooltrue
swap_spacefloat4
cpu_offload_gbfloat0
gpu_memory_utilizationfloat0.9
max_num_batched_tokens`intNone`
max_num_partial_prefills`intNone`1
max_long_partial_prefills`intNone`1
long_prefill_token_threshold`intNone`0
max_num_seqs`intNone`
max_logprobsint20
disable_log_statsboolfalse
revision`strNone`
code_revision`strNone`
rope_scaling`dict[str, Any]None`
rope_theta`floatNone`
hf_overrides`dict[str, Any]Callable[[<class 'transformers.configuration_utils.PretrainedConfig'>], PretrainedConfig]None`
tokenizer_revision`strNone`
quantization`strNone`
enforce_eager`boolNone`
max_seq_len_to_captureint8192
disable_custom_all_reduceboolfalse
tokenizer_pool_sizeint0
tokenizer_pool_type`strtype[ForwardRef('BaseTokenizerGroup')]`ray
tokenizer_pool_extra_config`dict[str, Any]None`
limit_mm_per_prompt`Mapping[str, int]None`
mm_processor_kwargs`dict[str, Any]None`
disable_mm_preprocessor_cacheboolfalse
enable_loraboolfalse
enable_lora_biasboolfalse
max_lorasint1
max_lora_rankint16
enable_prompt_adapterboolfalse
max_prompt_adaptersint1
max_prompt_adapter_tokenint0
fully_sharded_lorasboolfalse
lora_extra_vocab_sizeint256
long_lora_scaling_factors`tuple[float]None`
lora_dtype`strdtypeNone`
max_cpu_loras`intNone`
devicestrauto
num_scheduler_stepsint1
multi_step_stream_outputsbooltrue
ray_workers_use_nsightboolfalse
num_gpu_blocks_override`intNone`
num_lookahead_slotsint0
model_loader_extra_config`dictNone`
ignore_patterns`strlist[str]None`
preemption_mode`strNone`
scheduler_delay_factorfloat0.0
enable_chunked_prefill`boolNone`
guided_decoding_backendstrxgrammar
logits_processor_pattern`strNone`
speculative_model`strNone`
speculative_model_quantization`strNone`
speculative_draft_tensor_parallel_size`intNone`
num_speculative_tokens`intNone`
speculative_disable_mqa_scorer`boolNone`false
speculative_max_model_len`intNone`
speculative_disable_by_batch_size`intNone`
ngram_prompt_lookup_max`intNone`
ngram_prompt_lookup_min`intNone`
spec_decoding_acceptance_methodstrrejection_sampler
typical_acceptance_sampler_posterior_threshold`floatNone`
typical_acceptance_sampler_posterior_alpha`floatNone`
qlora_adapter_name_or_path`strNone`
disable_logprobs_during_spec_decoding`boolNone`
otlp_traces_endpoint`strNone`
collect_detailed_traces`strNone`
disable_async_output_procboolfalse
scheduling_policyLiteral['fcfs', 'priority']fcfs
scheduler_cls`strtype[object]`vllm.core.scheduler.Scheduler
override_neuron_config`dict[str, Any]None`
override_pooler_config`PoolerConfigNone`
compilation_config`CompilationConfigNone`
worker_clsstrauto
kv_transfer_config`KVTransferConfigNone`
generation_config`strNone`
override_generation_config`dict[str, Any]None`
enable_sleep_modeboolfalse
model_implstrauto
calculate_kv_scales`boolNone`
additional_config`dict[str, Any]None`

Sampling Parameters

FieldTypeDefaultDescription
ninteger1
best_ofunknown
_real_nunknown
presence_penaltynumber0.0
frequency_penaltynumber0.0
repetition_penaltynumber1.0
temperaturenumber1.0
top_pnumber1.0
top_kinteger-1
min_pnumber0.0
seedunknown
stopunknown
stop_token_idsunknown
bad_wordsunknown
ignore_eosbooleanfalse
max_tokensunknown16
min_tokensinteger0
logprobsunknown
prompt_logprobsunknown
detokenizebooleantrue
skip_special_tokensbooleantrue
spaces_between_special_tokensbooleantrue
logits_processorsunknown
include_stop_str_in_outputbooleanfalse
truncate_prompt_tokensunknown
output_kindunknown0
output_text_buffer_lengthinteger0
_all_stop_token_idsarray[]
guided_decodingunknown
logit_biasunknown
allowed_token_idsunknown