vllm.attention.layers.chunked_local_attention
ChunkedLocalAttention ¶
Bases: Attention
Source code in vllm/attention/layers/chunked_local_attention.py
__init__ ¶
__init__(
num_heads: int,
head_size: int,
scale: float,
attention_chunk_size: int,
num_kv_heads: Optional[int] = None,
alibi_slopes: Optional[List[float]] = None,
cache_config: Optional[CacheConfig] = None,
quant_config: Optional[QuantizationConfig] = None,
kv_sharing_target_layer_name: Optional[str] = None,
prefix: str = "",
)
Source code in vllm/attention/layers/chunked_local_attention.py
create_chunked_local_attention_backend cached
¶
create_chunked_local_attention_backend(
underlying_attn_backend: AttentionBackend,
attention_chunk_size: int,
block_size: int,
) -> type[AttentionBackend]