vllm.utils.deep_gemm
Compatibility wrapper for DeepGEMM API changes.
Users of vLLM should always import only these wrappers.
__all__ module-attribute
¶
__all__ = [
"calc_diff",
"fp8_gemm_nt",
"m_grouped_fp8_gemm_nt_contiguous",
"fp8_m_grouped_gemm_nt_masked",
"per_block_cast_to_fp8",
"is_blackwell_deep_gemm_e8m0_used",
"is_deep_gemm_supported",
"should_use_deepgemm_for_fp8_linear",
]
_align ¶
_lazy_init ¶
Import deep_gemm and resolve symbols on first use.
Source code in vllm/utils/deep_gemm.py
_missing ¶
Placeholder for unavailable DeepGEMM backend.
_resolve_symbol ¶
Return the new symbol if it exists, otherwise the old one.
Source code in vllm/utils/deep_gemm.py
calc_diff ¶
Return a global difference metric for unit tests.
DeepGEMM kernels on Blackwell/B200 currently exhibit noticeable per-element error, causing torch.testing.assert_close
to fail. Instead of checking every element, we compute a cosine-style similarity over the whole tensor and report 1 - sim
. Once kernel accuracy improves this helper can be removed.
Source code in vllm/utils/deep_gemm.py
fp8_gemm_nt ¶
fp8_m_grouped_gemm_nt_masked ¶
Source code in vllm/utils/deep_gemm.py
is_blackwell_deep_gemm_e8m0_used cached
¶
is_blackwell_deep_gemm_e8m0_used() -> bool
Return True
if vLLM is configured to use DeepGEMM " "E8M0 scale on a Blackwell-class GPU.
Source code in vllm/utils/deep_gemm.py
is_deep_gemm_supported cached
¶
is_deep_gemm_supported() -> bool
Return True
if DeepGEMM is supported on the current platform. Currently, only Hopper and Blackwell GPUs are supported.
Source code in vllm/utils/deep_gemm.py
m_grouped_fp8_gemm_nt_contiguous ¶
per_block_cast_to_fp8 ¶
per_block_cast_to_fp8(
x: Tensor,
block_size: list[int] = DEFAULT_BLOCK_SIZE,
use_ue8m0: bool = False,
) -> tuple[Tensor, Tensor]