vllm.entrypoints.openai.cli_args
This file contains the command line arguments for the vLLM's OpenAI-compatible server. It is kept in a separate file for documentation purposes.
FrontendArgs ¶
Arguments for the OpenAI-compatible frontend server.
Source code in vllm/entrypoints/openai/cli_args.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
|
allow_credentials class-attribute
instance-attribute
¶
allow_credentials: bool = False
Allow credentials.
allowed_headers class-attribute
instance-attribute
¶
Allowed headers.
allowed_methods class-attribute
instance-attribute
¶
Allowed methods.
allowed_origins class-attribute
instance-attribute
¶
Allowed origins.
api_key class-attribute
instance-attribute
¶
If provided, the server will require one of these keys to be presented in the header.
chat_template class-attribute
instance-attribute
¶
The file path to the chat template, or the template in single-line form for the specified model.
chat_template_content_format class-attribute
instance-attribute
¶
chat_template_content_format: ChatTemplateContentFormatOption = "auto"
The format to render message content within a chat template.
- "string" will render the content as a string. Example:
"Hello World"
- "openai" will render the content as a list of dictionaries, similar to OpenAI schema. Example:
[{"type": "text", "text": "Hello world!"}]
disable_fastapi_docs class-attribute
instance-attribute
¶
disable_fastapi_docs: bool = False
Disable FastAPI's OpenAPI schema, Swagger UI, and ReDoc endpoint.
disable_frontend_multiprocessing class-attribute
instance-attribute
¶
disable_frontend_multiprocessing: bool = False
If specified, will run the OpenAI frontend server in the same process as the model serving engine.
disable_uvicorn_access_log class-attribute
instance-attribute
¶
disable_uvicorn_access_log: bool = False
Disable uvicorn access log.
enable_auto_tool_choice class-attribute
instance-attribute
¶
enable_auto_tool_choice: bool = False
If specified, exclude tool definitions in prompts when tool_choice='none'.
enable_force_include_usage class-attribute
instance-attribute
¶
enable_force_include_usage: bool = False
If set to True, including usage on every request.
enable_log_outputs class-attribute
instance-attribute
¶
enable_log_outputs: bool = False
If set to True, enable logging of model outputs (generations) in addition to the input logging that is enabled by default.
enable_prompt_tokens_details class-attribute
instance-attribute
¶
enable_prompt_tokens_details: bool = False
If set to True, enable prompt_tokens_details in usage.
enable_request_id_headers class-attribute
instance-attribute
¶
enable_request_id_headers: bool = False
If specified, API server will add X-Request-Id header to responses. Caution: this hurts performance at high QPS.
enable_server_load_tracking class-attribute
instance-attribute
¶
enable_server_load_tracking: bool = False
If set to True, enable tracking server_load_metrics in the app state.
enable_ssl_refresh class-attribute
instance-attribute
¶
enable_ssl_refresh: bool = False
Refresh SSL Context when SSL certificate files change
enable_tokenizer_info_endpoint class-attribute
instance-attribute
¶
enable_tokenizer_info_endpoint: bool = False
Enable the /get_tokenizer_info endpoint. May expose chat templates and other tokenizer configuration.
exclude_tools_when_tool_choice_none class-attribute
instance-attribute
¶
exclude_tools_when_tool_choice_none: bool = False
Enable auto tool choice for supported models. Use --tool-call-parser
to specify which parser to use.
h11_max_header_count class-attribute
instance-attribute
¶
h11_max_header_count: int = H11_MAX_HEADER_COUNT_DEFAULT
Maximum number of HTTP headers allowed in a request for h11 parser. Helps mitigate header abuse. Default: 256.
h11_max_incomplete_event_size class-attribute
instance-attribute
¶
h11_max_incomplete_event_size: int = (
H11_MAX_INCOMPLETE_EVENT_SIZE_DEFAULT
)
Maximum size (bytes) of an incomplete HTTP event (header or body) for h11 parser. Helps mitigate header abuse. Default: 4194304 (4 MB).
log_config_file class-attribute
instance-attribute
¶
log_config_file: Optional[str] = VLLM_LOGGING_CONFIG_PATH
Path to logging config JSON file for both vllm and uvicorn
lora_modules class-attribute
instance-attribute
¶
lora_modules: Optional[list[LoRAModulePath]] = None
LoRA modules configurations in either 'name=path' format or JSON format or JSON list format. Example (old format): 'name=path'
Example (new format): {"name": "name", "path": "lora_path", "base_model_name": "id"}
max_log_len class-attribute
instance-attribute
¶
Max number of prompt characters or prompt ID numbers being printed in log. The default of None means unlimited.
middleware class-attribute
instance-attribute
¶
Additional ASGI middleware to apply to the app. We accept multiple --middleware arguments. The value should be an import path. If a function is provided, vLLM will add it to the server using @app.middleware('http')
. If a class is provided, vLLM will add it to the server using app.add_middleware()
.
response_role class-attribute
instance-attribute
¶
response_role: str = 'assistant'
The role name to return if request.add_generation_prompt=true
.
return_tokens_as_token_ids class-attribute
instance-attribute
¶
return_tokens_as_token_ids: bool = False
When --max-logprobs
is specified, represents single tokens as strings of the form 'token_id:{token_id}' so that tokens that are not JSON-encodable can be identified.
root_path class-attribute
instance-attribute
¶
FastAPI root_path when app is behind a path based routing proxy.
ssl_ca_certs class-attribute
instance-attribute
¶
The CA certificates file.
ssl_cert_reqs class-attribute
instance-attribute
¶
Whether client certificate is required (see stdlib ssl module's).
ssl_certfile class-attribute
instance-attribute
¶
The file path to the SSL cert file.
ssl_keyfile class-attribute
instance-attribute
¶
The file path to the SSL key file.
tool_call_parser class-attribute
instance-attribute
¶
Select the tool call parser depending on the model that you're using. This is used to parse the model-generated tool call into OpenAI API format. Required for --enable-auto-tool-choice
. You can choose any option from the built-in parsers or register a plugin via --tool-parser-plugin
.
tool_parser_plugin class-attribute
instance-attribute
¶
tool_parser_plugin: str = ''
Special the tool parser plugin write to parse the model-generated tool into OpenAI API format, the name register in this plugin can be used in --tool-call-parser
.
tool_server class-attribute
instance-attribute
¶
Comma-separated list of host:port pairs (IPv4, IPv6, or hostname). Examples: 127.0.0.1:8000, [::1]:8000, localhost:1234. Or demo
for demo purpose.
uds class-attribute
instance-attribute
¶
Unix domain socket path. If set, host and port arguments are ignored.
uvicorn_log_level class-attribute
instance-attribute
¶
uvicorn_log_level: Literal[
"debug", "info", "warning", "error", "critical", "trace"
] = "info"
Log level for uvicorn.
add_cli_args staticmethod
¶
add_cli_args(
parser: FlexibleArgumentParser,
) -> FlexibleArgumentParser
Source code in vllm/entrypoints/openai/cli_args.py
LoRAParserAction ¶
Bases: Action
Source code in vllm/entrypoints/openai/cli_args.py
__call__ ¶
__call__(
parser: ArgumentParser,
namespace: Namespace,
values: Optional[Union[str, Sequence[str]]],
option_string: Optional[str] = None,
)
Source code in vllm/entrypoints/openai/cli_args.py
create_parser_for_docs ¶
create_parser_for_docs() -> FlexibleArgumentParser
make_arg_parser ¶
make_arg_parser(
parser: FlexibleArgumentParser,
) -> FlexibleArgumentParser
Create the CLI argument parser used by the OpenAI API server.
We rely on the helper methods of FrontendArgs
and AsyncEngineArgs
to register all arguments instead of manually enumerating them here. This avoids code duplication and keeps the argument definitions in one place.
Source code in vllm/entrypoints/openai/cli_args.py
validate_parsed_serve_args ¶
validate_parsed_serve_args(args: Namespace)
Quick checks for model serve args that raise prior to loading.