vllm.entrypoints.api_server
NOTE: This API server is used only for demonstrating usage of AsyncEngine and simple performance benchmarks. It is not intended for production use. For production use, we recommend using our OpenAI compatible server. We are also not going to accept PRs modifying this file, please change vllm/entrypoints/openai/api_server.py
instead.
_generate async
¶
_generate(
request_dict: dict, raw_request: Request
) -> Response
Source code in vllm/entrypoints/api_server.py
generate async
¶
Generate completion for the request.
The request should be a JSON object with the following fields: - prompt: the prompt to use for the generation. - stream: whether to stream the results or not. - other fields: the sampling parameters (See SamplingParams
for details).
Source code in vllm/entrypoints/api_server.py
health async
¶
init_app async
¶
init_app(
args: Namespace,
llm_engine: Optional[AsyncLLMEngine] = None,
) -> FastAPI
Source code in vllm/entrypoints/api_server.py
run_server async
¶
run_server(
args: Namespace,
llm_engine: Optional[AsyncLLMEngine] = None,
**uvicorn_kwargs: Any,
) -> None