vllm.model_executor.layers.utils
Utility methods for model layers.
apply_penalties ¶
apply_penalties(
logits: Tensor,
prompt_tokens_tensor: Tensor,
output_tokens_tensor: Tensor,
presence_penalties: Tensor,
frequency_penalties: Tensor,
repetition_penalties: Tensor,
) -> Tensor
Applies penalties in place to the logits tensor logits : The input logits tensor of shape [num_seqs, vocab_size] prompt_tokens_tensor: A tensor containing the prompt tokens. The prompts are padded to the maximum prompt length within the batch using vocab_size
as the padding value. The value vocab_size
is used for padding because it does not correspond to any valid token ID in the vocabulary. output_tokens_tensor: The output tokens tensor. presence_penalties: The presence penalties of shape (num_seqs, ) frequency_penalties: The frequency penalties of shape (num_seqs, ) repetition_penalties: The repetition penalties of shape (num_seqs, )
Source code in vllm/model_executor/layers/utils.py
check_cpu_sgl_kernel ¶
cpu_unquantized_gemm ¶
Source code in vllm/model_executor/layers/utils.py
default_unquantized_gemm ¶
dispatch_unquantized_gemm ¶
get_token_bin_counts_and_mask ¶
get_token_bin_counts_and_mask(
tokens: Tensor, vocab_size: int, num_seqs: int
) -> tuple[Tensor, Tensor]