vllm.model_executor.layers.quantization.tpu_int8
Int8TpuConfig ¶
Bases: QuantizationConfig
Int8 Quantization Config class for TPU Backend.
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
__init__ ¶
__init__(activation_scheme: str = 'none') -> None
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
from_config classmethod
¶
from_config(config: dict[str, Any]) -> Int8TpuConfig
get_config_filenames staticmethod
¶
get_name ¶
get_name() -> QuantizationMethods
get_quant_method ¶
get_quant_method(
layer: Module, prefix: str
) -> Optional[TPUInt8LinearMethod]
TPUInt8LinearMethod ¶
Bases: LinearMethodBase
Int8 Linear method for TPU Quant.
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
__init__ ¶
__init__(quant_config: Int8TpuConfig)
_quantize_weight ¶
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
apply ¶
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
create_weights ¶
create_weights(
layer: Module,
input_size_per_partition: int,
output_partition_sizes: list[int],
input_size: int,
output_size: int,
params_dtype: dtype,
**extra_weight_attrs,
)
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
process_weights_after_loading ¶
process_weights_after_loading(layer: Module) -> None