vllm.engine.multiprocessing
Modules:
Name | Description |
---|---|
client | |
engine | |
REQUEST_OUTPUTS_T module-attribute
¶
REQUEST_OUTPUTS_T = Union[
List[RequestOutput],
RPCAdapterLoadedResponse,
RPCIsSleepingResponse,
RPCError,
]
RPC_REQUEST_T module-attribute
¶
RPC_REQUEST_T = Union[
RPCProcessRequest,
RPCAbortRequest,
RPCStartupRequest,
RPCUProfileRequest,
RPCLoadAdapterRequest,
RPCResetMultiModalCacheRequest,
RPCResetPrefixCacheRequest,
RPCSleepRequest,
RPCWakeUpRequest,
RPCIsSleepingRequest,
]
MQEngineDeadError ¶
Bases: RuntimeError
RPCAbortRequest dataclass
¶
RPCAdapterLoadedResponse dataclass
¶
RPCError dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCIsSleepingRequest dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCIsSleepingResponse dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCLoadAdapterRequest dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCProcessRequest dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
lora_request class-attribute
instance-attribute
¶
lora_request: Optional[LoRARequest] = lora_request
trace_headers class-attribute
instance-attribute
¶
__init__ ¶
__init__(
prompt: PromptType,
params: Union[SamplingParams, PoolingParams],
request_id: str,
lora_request: Optional[LoRARequest] = None,
trace_headers: Optional[Mapping[str, str]] = None,
priority: int = 0,
) -> None
Source code in vllm/engine/multiprocessing/__init__.py
RPCResetMultiModalCacheRequest ¶
RPCResetPrefixCacheRequest dataclass
¶
RPCSleepRequest ¶
RPCStartupRequest ¶
RPCStartupResponse dataclass
¶
RPCUProfileRequest ¶
RPCWakeUpRequest dataclass
¶
ENGINE_DEAD_ERROR ¶
ENGINE_DEAD_ERROR(
error: Optional[BaseException] = None,
) -> MQEngineDeadError