vllm.model_executor.warmup.deep_gemm_warmup
Warmup deep_gemm kernels. DeepGEMM JIT's the kernels. The warmup aims to JIT all the kernels that would be used during model execution beforehand.
GROUPED_FP8_GEMM_NT_CONTIGUOUS_WARMUP_CACHE module-attribute
¶
_deepgemm_fp8_gemm_nt_warmup ¶
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
_deepgemm_grouped_fp8_gemm_nt_contiguous_warmup ¶
_deepgemm_grouped_fp8_gemm_nt_contiguous_warmup(
w1: Tensor,
w2: Tensor,
w1_scale: Tensor,
w2_scale: Tensor,
num_topk: int,
)
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
_extract_data_from_fused_moe_module ¶
Extract weights, weight scales and num_topk from FusedMoE module.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
_extract_data_from_linear_base_module ¶
Extract weights, weight scales and quantization block sizes from the given LinearBase module.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
_fp8_linear_may_use_deep_gemm ¶
Return True if the input module/layer could be processed with DeepGEMM.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
_fused_moe_grouped_gemm_may_use_deep_gemm ¶
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
deep_gemm_warmup ¶
deepgemm_fp8_gemm_nt_warmup ¶
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
deepgemm_grouped_fp8_gemm_nt_contiguous_warmup ¶
deepgemm_grouped_fp8_gemm_nt_contiguous_warmup(
model: Module,
)