vllm.transformers_utils.configs.ovis
AIMv2Config ¶
Bases: PretrainedConfig
This is the configuration class to store the configuration of an [AIMv2Model
]. Instantiating a configuration with the defaults will yield a similar configuration to that of the apple/aimv2-large-patch14-224. Args: hidden_size: Dimension of the hidden representations. intermediate_size: Dimension of the SwiGLU representations. num_hidden_layers: Number of hidden layers in the Transformer. num_attention_heads: Number of attention heads for each attention layer in the Transformer. num_channels: Number of input channels. image_size: Image size. patch_size: Patch size. rms_norm_eps: Epsilon value used for the RMS normalization layer. attention_dropout: Dropout ratio for attention probabilities. projection_dropout: Dropout ratio for the projection layer after the attention. qkv_bias: Whether to add a bias to the queries, keys and values. use_bias: Whether to add a bias in the feed-forward and projection layers. kwargs: Keyword arguments for the [PretrainedConfig
].
Source code in vllm/transformers_utils/configs/ovis.py
__init__ ¶
__init__(
hidden_size: int = 1024,
intermediate_size: int = 2816,
num_hidden_layers: int = 24,
num_attention_heads: int = 8,
num_channels: int = 3,
image_size: int = 224,
patch_size: int = 14,
rms_norm_eps: float = 1e-05,
attention_dropout: float = 0.0,
projection_dropout: float = 0.0,
qkv_bias: bool = False,
use_bias: bool = False,
**kwargs: Any,
)
Source code in vllm/transformers_utils/configs/ovis.py
Aimv2VisualTokenizerConfig ¶
Bases: BaseVisualTokenizerConfig
Source code in vllm/transformers_utils/configs/ovis.py
__init__ ¶
BaseVisualTokenizerConfig ¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/ovis.py
__init__ ¶
__init__(
vocab_size=16384,
tokenize_function="softmax",
tau=1.0,
depths=None,
drop_cls_token=False,
backbone_config: Optional[
Union[PretrainedConfig, dict]
] = None,
hidden_stride: int = 1,
**kwargs,
)
Source code in vllm/transformers_utils/configs/ovis.py
OvisConfig ¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/ovis.py
conversation_formatter_class instance-attribute
¶
__init__ ¶
__init__(
llm_config: Optional[
Union[PretrainedConfig, dict]
] = None,
visual_tokenizer_config: Optional[
Union[PretrainedConfig, dict]
] = None,
multimodal_max_length=8192,
hidden_size=None,
conversation_formatter_class=None,
llm_attn_implementation=None,
disable_tie_weight=False,
**kwargs,
)
Source code in vllm/transformers_utils/configs/ovis.py
SiglipVisualTokenizerConfig ¶
Bases: BaseVisualTokenizerConfig