TensorRT-LLM Engine Schema
Engine version: 0.21.0
Discovered at: 2026-05-06T20:20:57+02:00
Discovery method: TrtLlmArgs.model_json_schema() + dataclasses.fields(SamplingParams)
Schema version: 1.0.0
Summary: 60 engine parameters, 47 sampling parameters.
Discovery limitations
engine_params— BuildConfig is not a Pydantic model; appears as Optional[object] in the schema Affected fields:build_configsampling_params— SamplingParams is a dataclass; no per-field descriptions
Engine Parameters
| Field | Type | Default | Description |
|---|---|---|---|
model | string | — | The path to the model checkpoint or the model name from the Hugging Face Hub. |
tokenizer | `string | None` | — |
tokenizer_mode | Literal['auto', 'slow'] | auto | The mode to initialize the tokenizer. |
skip_tokenizer_init | boolean | false | Whether to skip the tokenizer initialization. |
trust_remote_code | boolean | false | Whether to trust the remote code. |
tensor_parallel_size | integer | 1 | The tensor parallel size. |
dtype | string | auto | The data type to use for the model. |
revision | `string | None` | — |
tokenizer_revision | `string | None` | — |
pipeline_parallel_size | integer | 1 | The pipeline parallel size. |
context_parallel_size | integer | 1 | The context parallel size. |
gpus_per_node | `integer | None` | — |
moe_cluster_parallel_size | `integer | None` | — |
moe_tensor_parallel_size | `integer | None` | — |
moe_expert_parallel_size | `integer | None` | — |
enable_attention_dp | boolean | false | Enable attention data parallel. |
cp_config | `object | None` | — |
load_format | Literal['auto', 'dummy'] | auto | The format to load the model. |
enable_lora | boolean | false | Enable LoRA. |
⚠️ max_lora_rank | `integer | None` | — |
⚠️ max_loras | integer | 4 | The maximum number of LoRA. |
⚠️ max_cpu_loras | integer | 4 | The maximum number of LoRA on CPU. |
lora_config | `LoraConfig | None` | — |
enable_prompt_adapter | boolean | false | Enable prompt adapter. |
max_prompt_adapter_token | integer | 0 | The maximum number of prompt adapter tokens. |
quant_config | `QuantConfig | None` | — |
kv_cache_config | KvCacheConfig | — | KV cache config. |
enable_chunked_prefill | boolean | false | Enable chunked prefill. |
guided_decoding_backend | `string | None` | — |
batched_logits_processor | Optional[tensorrt_llm.sampling_params.BatchedLogitsProcessor] | — | Batched logits processor. |
iter_stats_max_iterations | `integer | None` | — |
request_stats_max_iterations | `integer | None` | — |
peft_cache_config | `PeftCacheConfig | None` | — |
scheduler_config | SchedulerConfig | — | Scheduler config. |
cache_transceiver_config | `CacheTransceiverConfig | None` | — |
speculative_config | `LookaheadDecodingConfig | MedusaDecodingConfig | EagleDecodingConfig |
batching_type | `BatchingType | None` | — |
normalize_log_probs | boolean | false | Normalize log probabilities. |
max_batch_size | `integer | None` | — |
max_input_len | `integer | None` | — |
max_seq_len | `integer | None` | — |
max_beam_width | `integer | None` | — |
max_num_tokens | `integer | None` | — |
gather_generation_logits | boolean | false | Gather generation logits. |
num_postprocess_workers | integer | 0 | The number of processes used for postprocessing the generated tokens, including detokenization. |
postprocess_tokenizer_dir | `string | None` | — |
reasoning_parser | `string | None` | — |
garbage_collection_gen0_threshold | integer | 20000 | Threshold for Python garbage collection of generation 0 objects.Lower values trigger more frequent garbage collection. |
⚠️ decoding_config | Optional[DecodingConfig] | — | The decoding config. |
backend | `string | None` | — |
⚠️ auto_parallel | boolean | false | Enable auto parallel mode. |
⚠️ auto_parallel_world_size | `integer | None` | — |
enable_tqdm | boolean | false | Enable tqdm for progress bar. |
workspace | `string | None` | — |
enable_build_cache | Union[tensorrt_llm.llmapi.build_cache.BuildCacheConfig, bool] | false | Enable build cache. |
extended_runtime_perf_knob_config | `ExtendedRuntimePerfKnobConfig | None` | — |
calib_config | `CalibConfig | None` | — |
embedding_parallel_mode | string | SHARDING_ALONG_VOCAB | The embedding parallel mode. |
fast_build | boolean | false | Enable fast build. |
build_config | Optional[tensorrt_llm.builder.BuildConfig] | — | Build config. |
Sampling Parameters
| Field | Type | Default | Description |
|---|---|---|---|
end_id | `int | None` | — |
pad_id | `int | None` | — |
max_tokens | int | 32 | |
bad | `str | list[str] | None` |
bad_token_ids | `list[int] | None` | — |
stop | `str | list[str] | None` |
stop_token_ids | `list[int] | None` | — |
include_stop_str_in_output | bool | false | |
embedding_bias | `Tensor | None` | — |
logits_processor | `LogitsProcessor | list[LogitsProcessor] | None` |
apply_batched_logits_processor | bool | false | |
n | int | 1 | |
best_of | `int | None` | — |
use_beam_search | bool | false | |
top_k | `int | None` | — |
top_p | `float | None` | — |
top_p_min | `float | None` | — |
top_p_reset_ids | `int | None` | — |
top_p_decay | `float | None` | — |
seed | `int | None` | — |
temperature | `float | None` | — |
min_tokens | `int | None` | — |
beam_search_diversity_rate | `float | None` | — |
repetition_penalty | `float | None` | — |
presence_penalty | `float | None` | — |
frequency_penalty | `float | None` | — |
length_penalty | `float | None` | — |
early_stopping | `int | None` | — |
no_repeat_ngram_size | `int | None` | — |
min_p | `float | None` | — |
beam_width_array | `list[int] | None` | — |
logprobs | `int | None` | — |
prompt_logprobs | `int | None` | — |
return_context_logits | bool | false | |
return_generation_logits | bool | false | |
exclude_input_from_output | bool | true | |
return_encoder_output | bool | false | |
return_perf_metrics | bool | false | |
additional_model_outputs | `list[AdditionalModelOutput] | None` | — |
lookahead_config | `LookaheadDecodingConfig | None` | — |
guided_decoding | `GuidedDecodingParams | None` | — |
ignore_eos | bool | false | |
detokenize | bool | true | |
add_special_tokens | bool | true | |
truncate_prompt_tokens | `int | None` | — |
skip_special_tokens | bool | true | |
spaces_between_special_tokens | bool | true |