vllm.v1.attention.backends.mamba_attn
BaseMambaAttentionMetadataBuilder ¶
Bases: AttentionMetadataBuilder[M]
, ABC
Source code in vllm/v1/attention/backends/mamba_attn.py
cudagraph_support class-attribute
¶
cudagraph_support: AttentionCGSupport = (
UNIFORM_SINGLE_TOKEN_DECODE
)
decode_cudagraph_max_bs instance-attribute
¶
decode_cudagraph_max_bs = min(
max_num_seqs, max_capture_size
)
state_indices_tensor instance-attribute
¶
state_indices_tensor = empty(
(decode_cudagraph_max_bs,), dtype=int32, device=device
)
__init__ ¶
__init__(
kv_cache_spec: AttentionSpec,
layer_names: list[str],
vllm_config: VllmConfig,
device: device,
)
Source code in vllm/v1/attention/backends/mamba_attn.py
build_for_cudagraph_capture ¶
build_for_cudagraph_capture(
common_attn_metadata: CommonAttentionMetadata,
) -> M
This method builds the metadata for full cudagraph capture. Currently, only decode is supported for full cudagraphs with Mamba.