vllm.v1.executor.ray_distributed_executor
FutureWrapper ¶
Bases: Future
A wrapper around Ray output reference to meet the interface of .execute_model(): The top level (core busy loop) expects .result() api to block and return a single output.
If aggregator is provided, the outputs from all workers are aggregated upon the result() call. If not only the first worker's output is returned.
Source code in vllm/v1/executor/ray_distributed_executor.py
__init__ ¶
__init__(
refs, aggregator: Optional[KVOutputAggregator] = None
)
result ¶
Source code in vllm/v1/executor/ray_distributed_executor.py
RayDistributedExecutor ¶
Bases: RayDistributedExecutor
, Executor
Ray distributed executor using Ray Compiled Graphs.
Source code in vllm/v1/executor/ray_distributed_executor.py
max_concurrent_batches property
¶
max_concurrent_batches: int
Ray distributed executor supports pipeline parallelism, meaning that it allows PP size batches to be executed concurrently.
_init_executor ¶
Source code in vllm/v1/executor/ray_distributed_executor.py
execute_model ¶
execute_model(
scheduler_output,
) -> Union[ModelRunnerOutput, Future[ModelRunnerOutput]]
Execute the model on the Ray workers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scheduler_output | The scheduler output to execute. | required |
Returns:
Type | Description |
---|---|
Union[ModelRunnerOutput, Future[ModelRunnerOutput]] | The model runner output. |
Source code in vllm/v1/executor/ray_distributed_executor.py
reinitialize_distributed ¶
reinitialize_distributed(
reconfig_request: ReconfigureDistributedRequest,
) -> None