vllm.model_executor.models.clip
Minimal implementation of CLIPVisionModel intended to be only used within a vision language model.
CLIPAttention ¶
Bases: Module
Multi-headed attention from 'Attention Is All You Need' paper
Source code in vllm/model_executor/models/clip.py
out_proj instance-attribute
¶
out_proj = RowParallelLinear(
input_size=embed_dim,
output_size=embed_dim,
quant_config=quant_config,
prefix=f"{prefix}.out_proj",
)
qkv_proj instance-attribute
¶
qkv_proj = QKVParallelLinear(
hidden_size=embed_dim,
head_size=head_dim,
total_num_heads=num_heads,
quant_config=quant_config,
prefix=f"{prefix}.qkv_proj",
)
__init__ ¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
)
Source code in vllm/model_executor/models/clip.py
forward ¶
forward(hidden_states: Tensor)
Input shape: Batch x Time x Channel
Source code in vllm/model_executor/models/clip.py
CLIPEncoder ¶
Bases: Module
Transformer encoder consisting of config.num_hidden_layers
self attention layers. Each layer is a [CLIPEncoderLayer
].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config | CLIPVisionConfig | CLIPConfig | required |
Source code in vllm/model_executor/models/clip.py
layers instance-attribute
¶
layers = ModuleList(
[
(
CLIPEncoderLayer(
config=config,
quant_config=quant_config,
prefix=f"{prefix}.layers.{layer_idx}",
)
)
for layer_idx in (range(num_hidden_layers))
]
)
__init__ ¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
num_hidden_layers_override: Optional[int] = None,
prefix: str = "",
) -> None
Source code in vllm/model_executor/models/clip.py
forward ¶
Source code in vllm/model_executor/models/clip.py
CLIPEncoderInfo ¶
Bases: VisionEncoderInfo[CLIPVisionConfig]
Source code in vllm/model_executor/models/clip.py
CLIPEncoderLayer ¶
Bases: Module
Source code in vllm/model_executor/models/clip.py
mlp instance-attribute
¶
mlp = CLIPMLP(
config,
quant_config=quant_config,
prefix=f"{prefix}.mlp",
)
self_attn instance-attribute
¶
self_attn = CLIPAttention(
config,
quant_config=quant_config,
prefix=f"{prefix}.self_attn",
)
__init__ ¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
) -> None
Source code in vllm/model_executor/models/clip.py
forward ¶
Source code in vllm/model_executor/models/clip.py
CLIPMLP ¶
Bases: Module
Source code in vllm/model_executor/models/clip.py
fc1 instance-attribute
¶
fc1 = ColumnParallelLinear(
hidden_size,
intermediate_size,
bias=True,
quant_config=quant_config,
prefix=f"{prefix}.fc1",
)
fc2 instance-attribute
¶
fc2 = RowParallelLinear(
intermediate_size,
hidden_size,
bias=True,
quant_config=quant_config,
prefix=f"{prefix}.fc2",
)
__init__ ¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
) -> None
Source code in vllm/model_executor/models/clip.py
forward ¶
CLIPVisionEmbeddings ¶
Bases: Module
Source code in vllm/model_executor/models/clip.py
patch_embedding instance-attribute
¶
patch_embedding = Conv2d(
in_channels=num_channels,
out_channels=embed_dim,
kernel_size=patch_size,
stride=patch_size,
bias=False,
)
__init__ ¶
Source code in vllm/model_executor/models/clip.py
forward ¶
Source code in vllm/model_executor/models/clip.py
CLIPVisionModel ¶
Bases: Module
, SupportsQuant
Source code in vllm/model_executor/models/clip.py
packed_modules_mapping class-attribute
instance-attribute
¶
vision_model instance-attribute
¶
vision_model = CLIPVisionTransformer(
config=config,
quant_config=quant_config,
num_hidden_layers_override=num_hidden_layers_override,
require_post_norm=require_post_norm,
prefix=f"{prefix}.vision_model",
)
__init__ ¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
*,
num_hidden_layers_override: Optional[int] = None,
require_post_norm: Optional[bool] = None,
prefix: str = "",
) -> None
Source code in vllm/model_executor/models/clip.py
forward ¶
load_weights ¶
Source code in vllm/model_executor/models/clip.py
CLIPVisionTransformer ¶
Bases: Module
Source code in vllm/model_executor/models/clip.py
encoder instance-attribute
¶
encoder = CLIPEncoder(
config=config,
quant_config=quant_config,
num_hidden_layers_override=num_hidden_layers_override,
prefix=f"{prefix}.encoder",
)
__init__ ¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
*,
num_hidden_layers_override: Optional[int] = None,
require_post_norm: Optional[bool] = None,
prefix: str = "",
) -> None