vllm.v1.attention.backends.short_conv_attn
ShortConvAttentionBackend ¶
Bases: AttentionBackend
Source code in vllm/v1/attention/backends/short_conv_attn.py
ShortConvAttentionMetadata dataclass
¶
Source code in vllm/v1/attention/backends/short_conv_attn.py
token_chunk_offset_ptr class-attribute
instance-attribute
¶
__init__ ¶
__init__(
num_prefills: int,
num_prefill_tokens: int,
num_decodes: int,
num_decode_tokens: int,
query_start_loc: Tensor,
has_initial_states: Tensor,
state_indices_tensor: Tensor,
nums_dict: Optional[dict] = None,
cu_seqlen: Optional[int] = None,
batch_ptr: Optional[tensor] = None,
token_chunk_offset_ptr: Optional[tensor] = None,
) -> None
ShortConvAttentionMetadataBuilder ¶
Bases: AttentionMetadataBuilder[ShortConvAttentionMetadata]
Source code in vllm/v1/attention/backends/short_conv_attn.py
__init__ ¶
__init__(
kv_cache_spec: AttentionSpec,
layer_names: list[str],
vllm_config: VllmConfig,
device: device,
)
build ¶
build(
common_prefix_len: int,
common_attn_metadata: CommonAttentionMetadata,
fast_build: bool = False,
) -> ShortConvAttentionMetadata