vllm.model_executor.models.vision
VisionEncoderInfo ¶
Source code in vllm/model_executor/models/vision.py
VisionLanguageConfig ¶
get_vision_encoder_info ¶
get_vision_encoder_info(
hf_config: VisionLanguageConfig,
) -> VisionEncoderInfo
Source code in vllm/model_executor/models/vision.py
get_vit_attn_backend ¶
Get the available attention backend for Vision Transformer.
Source code in vllm/model_executor/models/vision.py
resolve_visual_encoder_outputs ¶
resolve_visual_encoder_outputs(
encoder_outputs: Union[Tensor, list[Tensor]],
feature_sample_layers: Optional[list[int]],
post_layer_norm: Optional[LayerNorm],
max_possible_layers: int,
) -> Tensor
Given the outputs a visual encoder module that may correspond to the output of the last layer, or a list of hidden states to be stacked, handle post normalization and resolve it into a single output tensor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
encoder_outputs | Union[Tensor, list[Tensor]] | Output of encoder's last layer or all hidden states. | required |
feature_sample_layers | Optional[list[int]] | Optional layer indices to grab from the encoder outputs; if provided, encoder outputs must be a list. | required |
post_layer_norm | Optional[LayerNorm] | Post norm to apply to the output of the encoder. | required |
max_possible_layers | int | Total layers in the fully loaded visual encoder. | required |