vllm.model_executor.layers.quantization.kernels.mixed_precision
Modules:
Name | Description |
---|---|
MPLinearKernel | |
allspark | |
bitblas | |
conch | |
cutlass | |
dynamic_4bit | |
exllama | |
machete | |
marlin | |
_POSSIBLE_KERNELS module-attribute
¶
_POSSIBLE_KERNELS: list[type[MPLinearKernel]] = [
CutlassW4A8LinearKernel,
MacheteLinearKernel,
AllSparkLinearKernel,
MarlinLinearKernel,
Dynamic4bitLinearKernel,
BitBLASLinearKernel,
ConchLinearKernel,
ExllamaLinearKernel,
]
choose_mp_linear_kernel ¶
choose_mp_linear_kernel(
config: MPLinearLayerConfig,
compute_capability: Optional[int] = None,
) -> type[MPLinearKernel]
Choose an MPLinearKernel that can implement the given config for the given compute capability. Attempts to choose the best kernel in terms of performance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config | MPLinearLayerConfig | Description of the linear layer to be implemented. | required |
compute_capability | Optional[int] | The compute capability of the target device, if None uses | None |
Raises:
Type | Description |
---|---|
ValueError | If no kernel can implement the given config. |
Returns:
Type | Description |
---|---|
type[MPLinearKernel] | type[MPLinearKernel]: Chosen kernel. |