vllm.model_executor.layers.quantization.quark.schemes
Modules:
Name | Description |
---|---|
quark_scheme | |
quark_w4a4_mxfp4 | |
quark_w8a8_fp8 | |
quark_w8a8_int8 | |
__all__ module-attribute
¶
QuarkScheme ¶
Bases: ABC
Abstract class used to describe the weight creation and forward pass of different quantization schemes supported by Quark.
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_scheme.py
apply_weights abstractmethod
¶
Run the forward pass for the particular scheme. This is where scheme-specific dequant/quant steps/kernels should be applied.
:param layer: torch.nn.Module with the registered weights and other parameters relevant to the particular scheme. :param x: input to the layer :param bias: bias parameter
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_scheme.py
create_weights abstractmethod
¶
Weight creation for the particular scheme. Inputs to this function
QuarkW4A4MXFP4 ¶
Bases: QuarkScheme
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py
__init__ ¶
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py
apply_weights ¶
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py
create_weights ¶
create_weights(
layer: Module,
output_partition_sizes: list[int],
input_size_per_partition: int,
params_dtype: dtype,
weight_loader: Callable,
**kwargs,
)
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py
QuarkW8A8Fp8 ¶
Bases: QuarkScheme
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
|
act_quant_group_shape instance-attribute
¶
act_quant_group_shape = (
PER_TOKEN if per_token else PER_TENSOR
)
fp8_linear instance-attribute
¶
fp8_linear = Fp8LinearOp(
act_quant_static=is_static_input_scheme,
act_quant_group_shape=act_quant_group_shape,
)
__init__ ¶
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py
apply_weights ¶
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py
create_weights ¶
create_weights(
layer: Module,
output_partition_sizes: list[int],
input_size_per_partition: int,
params_dtype: dtype,
weight_loader: Callable,
**kwargs,
)
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py
process_weights_after_loading ¶
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py
QuarkW8A8Int8 ¶
Bases: QuarkScheme
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
_kernel_backends_being_used class-attribute
instance-attribute
¶
__init__ ¶
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8.py
apply_weights ¶
create_weights ¶
create_weights(
layer: Module,
output_partition_sizes: list[int],
input_size_per_partition: int,
params_dtype: dtype,
weight_loader: Callable,
**kwargs,
)
Source code in vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8.py
process_weights_after_loading ¶
process_weights_after_loading(layer: Module) -> None