vllm.v1.utils
APIServerProcessManager ¶
Manages a group of API server processes.
Handles creation, monitoring, and termination of API server worker processes. Also monitors extra processes to check if they are healthy.
Source code in vllm/v1/utils.py
__init__ ¶
__init__(
target_server_fn: Callable,
listen_address: str,
sock: Any,
args: Namespace,
num_servers: int,
input_addresses: list[str],
output_addresses: list[str],
stats_update_address: Optional[str] = None,
)
Initialize and start API server worker processes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_server_fn | Callable | Function to call for each API server process | required |
listen_address | str | Address to listen for client connections | required |
sock | Any | Socket for client connections | required |
args | Namespace | Command line arguments | required |
num_servers | int | Number of API server processes to start | required |
input_addresses | list[str] | Input addresses for each API server | required |
output_addresses | list[str] | Output addresses for each API server | required |
stats_update_address | Optional[str] | Optional stats update address | None |
Source code in vllm/v1/utils.py
ConstantList ¶
Source code in vllm/v1/utils.py
copy_slice ¶
Copy the first length elements of a tensor into another tensor in a non-blocking manner.
Used to copy pinned CPU tensor data to pre-allocated GPU tensors.
Returns the sliced target tensor.
Source code in vllm/v1/utils.py
get_engine_client_zmq_addr ¶
Assign a new ZMQ socket address.
If local_only is True, participants are colocated and so a unique IPC address will be returned.
Otherwise, the provided host and port will be used to construct a TCP address (port == 0 means assign an available port).
Source code in vllm/v1/utils.py
report_usage_stats ¶
report_usage_stats(
vllm_config,
usage_context: UsageContext = ENGINE_CONTEXT,
) -> None
Report usage statistics if enabled.
Source code in vllm/v1/utils.py
shutdown ¶
shutdown(procs: list[BaseProcess])
Source code in vllm/v1/utils.py
wait_for_completion_or_failure ¶
wait_for_completion_or_failure(
api_server_manager: APIServerProcessManager,
engine_manager: Optional[
Union[CoreEngineProcManager, CoreEngineActorManager]
] = None,
coordinator: Optional[DPCoordinator] = None,
) -> None
Wait for all processes to complete or detect if any fail.
Raises an exception if any process exits with a non-zero status.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
api_server_manager | APIServerProcessManager | The manager for API servers. | required |
engine_manager | Optional[Union[CoreEngineProcManager, CoreEngineActorManager]] | The manager for engine processes. If CoreEngineProcManager, it manages local engines; if CoreEngineActorManager, it manages all engines. | None |
coordinator | Optional[DPCoordinator] | The coordinator for data parallel. | None |