vllm.model_executor.layers.quantization.utils.petit_utils
_PETIT_INSTALL_MSG module-attribute
¶
_check_petit_nvfp4_supported ¶
_check_petit_nvfp4_supported(
quant_method: str, group_size: Optional[int]
) -> tuple[bool, Optional[str]]
Source code in vllm/model_executor/layers/quantization/utils/petit_utils.py
_import_petit_kernel ¶
_import_petit_kernel() -> ModuleType
A helper function to handle the lazy import. The first time this function is called, it will import the petit_kernel library and store it in the global _petit_kernel variable. Subsequent calls will return the already-loaded module directly.
Source code in vllm/model_executor/layers/quantization/utils/petit_utils.py
apply_petit_nvfp4_linear ¶
apply_petit_nvfp4_linear(
input: Tensor,
weight: Tensor,
weight_scale: Tensor,
weight_scale_2: Tensor,
size_n: int,
size_k: int,
bias: Optional[Tensor] = None,
) -> Tensor
Source code in vllm/model_executor/layers/quantization/utils/petit_utils.py
prepare_nvfp4_layer_for_petit ¶
prepare_nvfp4_layer_for_petit(layer: Module) -> None