vllm.compilation.cuda_piecewise_backend
ConcreteSizeEntry dataclass
¶
Source code in vllm/compilation/cuda_piecewise_backend.py
PiecewiseBackend ¶
Source code in vllm/compilation/cuda_piecewise_backend.py
compiled_graph_for_general_shape instance-attribute
¶
is_last_graph instance-attribute
¶
__call__ ¶
__call__(*args) -> Any
Source code in vllm/compilation/cuda_piecewise_backend.py
__init__ ¶
__init__(
graph: GraphModule,
vllm_config: VllmConfig,
piecewise_compile_index: int,
total_piecewise_compiles: int,
sym_shape_indices: list[int],
compiled_graph_for_general_shape: Callable,
vllm_backend: VllmBackend,
)
The backend for piecewise compilation. It mainly handles the compilation of static shapes and dispatching based on runtime shape.
We will compile self.graph
once for the general shape, and then compile for different shapes specified in compilation_config.compile_sizes
.