vllm.attention.backends
Modules:
Name | Description |
---|---|
abstract | |
differential_flash_attn | " An implementation of https://arxiv.org/pdf/2410.05258 |
dual_chunk_flash_attn | Attention layer with Dual chunk flash attention and sparse attention. |
flash_attn | Attention layer with FlashAttention. |
flashmla | |
mla | |
placeholder_attn | |
rocm_aiter_mla | |
rocm_flash_attn | Attention layer ROCm GPUs. |
triton_mla | |
utils | Attention backend utils |
xformers | Attention layer with xFormers and PagedAttention. |