Skip to main content

TensorRT-LLM Parameter Curation

Engine version: 0.21.0
Discovered at: 2026-05-06 20:20:57+02:00
Discovery method: TrtLlmArgs.model_json_schema() + dataclasses.fields(SamplingParams)

Summary: 18/107 parameters curated (60 engine + 47 sampling discovered)

Delta vs previous: deferred until first probe-pass cycle.

Engine Parameters

FieldTypeDefaultCurated?
auto_parallelbooleanFalse-
auto_parallel_world_sizeinteger | Nonenull-
backendstring | Nonenullyes
batched_logits_processorOptional[tensorrt_llm.sampling_params.BatchedLogitsProcessor]null-
batching_typeBatchingType | Nonenull-
build_configOptional[tensorrt_llm.builder.BuildConfig]null-
cache_transceiver_configCacheTransceiverConfig | Nonenull-
calib_configCalibConfig | Nonenull-
context_parallel_sizeinteger1-
cp_configobject | Nonenull-
decoding_configOptional[DecodingConfig]null-
dtypestringautoyes
embedding_parallel_modestringSHARDING_ALONG_VOCAB-
enable_attention_dpbooleanFalse-
enable_build_cacheUnion[tensorrt_llm.llmapi.build_cache.BuildCacheConfig, bool]False-
enable_chunked_prefillbooleanFalse-
enable_lorabooleanFalse-
enable_prompt_adapterbooleanFalse-
enable_tqdmbooleanFalse-
extended_runtime_perf_knob_configExtendedRuntimePerfKnobConfig | Nonenull-
fast_buildbooleanFalseyes
garbage_collection_gen0_thresholdinteger20000-
gather_generation_logitsbooleanFalse-
gpus_per_nodeinteger | Nonenull-
guided_decoding_backendstring | Nonenull-
iter_stats_max_iterationsinteger | Nonenull-
kv_cache_configKvCacheConfignull-
load_formatLiteral['auto', 'dummy']auto-
lora_configLoraConfig | Nonenull-
max_batch_sizeinteger | Nonenullyes
max_beam_widthinteger | Nonenull-
max_cpu_lorasinteger4-
max_input_leninteger | Nonenullyes
max_lora_rankinteger | Nonenull-
max_lorasinteger4-
max_num_tokensinteger | Nonenullyes
max_prompt_adapter_tokeninteger0-
max_seq_leninteger | Nonenullyes
modelstringnull-
moe_cluster_parallel_sizeinteger | Nonenull-
moe_expert_parallel_sizeinteger | Nonenull-
moe_tensor_parallel_sizeinteger | Nonenull-
normalize_log_probsbooleanFalse-
num_postprocess_workersinteger0-
peft_cache_configPeftCacheConfig | Nonenull-
pipeline_parallel_sizeinteger1yes
postprocess_tokenizer_dirstring | Nonenull-
quant_configQuantConfig | Nonenull-
reasoning_parserstring | Nonenull-
request_stats_max_iterationsinteger | Nonenull-
revisionstring | Nonenull-
scheduler_configSchedulerConfignull-
skip_tokenizer_initbooleanFalse-
speculative_configLookaheadDecodingConfig | MedusaDecodingConfig | EagleDecodingConfig | MTPDecodingConfig | NGramDecodingConfig | DraftTargetDecodingConfig | Nonenull-
tensor_parallel_sizeinteger1yes
tokenizerstring | Nonenull-
tokenizer_modeLiteral['auto', 'slow']auto-
tokenizer_revisionstring | Nonenull-
trust_remote_codebooleanFalse-
workspacestring | Nonenull-

Sampling Parameters

FieldTypeDefaultCurated?
add_special_tokensboolTrue-
additional_model_outputslist[AdditionalModelOutput] | Nonenull-
apply_batched_logits_processorboolFalse-
badstr | list[str] | Nonenull-
bad_token_idslist[int] | Nonenull-
beam_search_diversity_ratefloat | Nonenull-
beam_width_arraylist[int] | Nonenull-
best_ofint | Nonenull-
detokenizeboolTrue-
early_stoppingint | Nonenull-
embedding_biasTensor | Nonenull-
end_idint | Nonenull-
exclude_input_from_outputboolTrue-
frequency_penaltyfloat | Nonenull-
guided_decodingGuidedDecodingParams | Nonenull-
ignore_eosboolFalseyes
include_stop_str_in_outputboolFalse-
length_penaltyfloat | Nonenull-
logits_processorLogitsProcessor | list[LogitsProcessor] | Nonenull-
logprobsint | Nonenull-
lookahead_configLookaheadDecodingConfig | Nonenull-
max_tokensint32yes
min_pfloat | Nonenullyes
min_tokensint | Nonenullyes
nint1yes
no_repeat_ngram_sizeint | Nonenull-
pad_idint | Nonenull-
presence_penaltyfloat | Nonenull-
prompt_logprobsint | Nonenull-
repetition_penaltyfloat | Nonenullyes
return_context_logitsboolFalse-
return_encoder_outputboolFalse-
return_generation_logitsboolFalse-
return_perf_metricsboolFalse-
seedint | Nonenull-
skip_special_tokensboolTrue-
spaces_between_special_tokensboolTrue-
stopstr | list[str] | Nonenull-
stop_token_idslist[int] | Nonenull-
temperaturefloat | Nonenullyes
top_kint | Nonenullyes
top_pfloat | Nonenullyes
top_p_decayfloat | Nonenull-
top_p_minfloat | Nonenull-
top_p_reset_idsint | Nonenull-
truncate_prompt_tokensint | Nonenull-
use_beam_searchboolFalse-