vllm.model_executor.models
Modules:
Name | Description |
---|---|
adapters | |
aimv2 | |
arcee | |
arctic | Inference-only Snowflake Arctic model. |
aria | |
aya_vision | |
baichuan | Inference-only BaiChuan model compatible with HuggingFace weights. |
bailing_moe | Inference-only BailingMoE model compatible with HuggingFace weights. |
bamba | Inference-only Bamba model. |
bart | PyTorch BART model. |
bert | |
bert_with_rope | |
blip | Minimal implementation of BlipVisionModel intended to be only used |
blip2 | |
bloom | Inference-only BLOOM model compatible with HuggingFace weights. |
chameleon | |
chatglm | Inference-only ChatGLM model compatible with THUDM weights. |
clip | Minimal implementation of CLIPVisionModel intended to be only used |
cohere2_vision | Command-A-Vision (Cohere2Vision) multimodal model implementation for vLLM. |
commandr | PyTorch Cohere model. |
config | |
constant_size_cache | |
dbrx | |
deepseek | Inference-only Deepseek model. |
deepseek_eagle | |
deepseek_mtp | |
deepseek_v2 | Inference-only DeepseekV2/DeepseekV3 model. |
deepseek_vl2 | Inference-only Deepseek-VL2 model compatible with HuggingFace weights. |
donut | |
dots1 | Inference-only dots1 model. |
ernie45 | Inference-only Erine model compatible with HuggingFace weights. |
ernie45_moe | Inference-only ErineMoE model compatible with HuggingFace weights. |
ernie_mtp | Inference-only Ernie-MTP model. |
exaone | Inference-only Exaone model compatible with HuggingFace weights. |
exaone4 | Inference-only Exaone model compatible with HuggingFace weights. |
fairseq2_llama | Llama model for fairseq2 weights. |
falcon | PyTorch Falcon model. |
falcon_h1 | Inference-only FalconH1 model. |
florence2 | |
fuyu | PyTorch Fuyu model. |
gemma | Inference-only Gemma model compatible with HuggingFace weights. |
gemma2 | |
gemma3 | |
gemma3_mm | |
gemma3n | |
gemma3n_mm | |
glm | Inference-only HF format GLM-4 model compatible with THUDM weights. |
glm4 | Inference-only GLM-4-0414 model compatible with HuggingFace weights. |
glm4_1v | Inference-only GLM-4V model compatible with HuggingFace weights. |
glm4_moe | Inference-only GLM-4.5 model compatible with HuggingFace weights. |
glm4_moe_mtp | Inference-only GLM-4.5 MTP model compatible with HuggingFace weights. |
glm4v | Inference-only CogAgent model compatible with THUDM weights. |
gpt2 | Inference-only GPT-2 model compatible with HuggingFace weights. |
gpt_bigcode | Inference-only GPTBigCode model compatible with HuggingFace weights. |
gpt_j | Inference-only GPT-J model compatible with HuggingFace weights. |
gpt_neox | Inference-only GPT-NeoX model compatible with HuggingFace weights. |
gpt_oss | |
granite | Inference-only IBM Granite model compatible with HuggingFace weights. |
granite_speech | Inference-only IBM Granite speech model. |
granitemoe | Inference-only GraniteMoe model. |
granitemoehybrid | Inference-only GraniteMoeHybrid model. |
granitemoeshared | Inference-only GraniteMoeShared model. |
gritlm | |
grok1 | Inference-only Grok1 model. |
h2ovl | |
hunyuan_v1 | Inference-only HunYuan model compatible with HuggingFace weights. |
hyperclovax_vision | |
idefics2_vision_model | PyTorch Idefics2 model. |
idefics3 | Inference-only Idefics3 model compatible with HuggingFace weights. |
interfaces | |
interfaces_base | |
intern_vit | |
internlm2 | |
internlm2_ve | |
interns1 | |
interns1_vit | |
internvl | |
jais | Inference-only Jais model compatible with HuggingFace weights. |
jamba | Inference-only Jamba model. |
jina_vl | |
keye | |
kimi_vl | |
lfm2 | |
llama | Inference-only LLaMA model compatible with HuggingFace weights. |
llama4 | Inference-only LLaMA model compatible with HuggingFace weights. |
llama4_eagle | |
llama_eagle | |
llama_eagle3 | |
llava | |
llava_next | |
llava_next_video | |
llava_onevision | |
mamba | PyTorch MAMBA model. |
mamba2 | PyTorch MAMBA2 model. |
mamba_cache | |
medusa | |
mimo | Inference-only MiMo model compatible with HuggingFace weights. |
mimo_mtp | Inference-only MiMo-MTP model. |
minicpm | Inference-only MiniCPM model compatible with HuggingFace weights. |
minicpm3 | Inference-only MiniCPM3 model compatible with HuggingFace weights. |
minicpm_eagle | Inference-only EagleMiniCPM model compatible with HuggingFace weights. |
minicpmo | Inference-only MiniCPM-O model compatible with HuggingFace weights. |
minicpmv | Inference-only MiniCPM-V model compatible with HuggingFace weights. |
minimax_cache | |
minimax_text_01 | Inference-only MiniMaxText01 model. |
minimax_vl_01 | |
mistral3 | |
mixtral | Inference-only Mixtral model. |
mixtral_quant | Inference-only Mixtral model. |
mllama | PyTorch Mllama model. |
mllama4 | |
mlp_speculator | |
modernbert | |
module_mapping | |
molmo | |
moonvit | |
mpt | |
nemotron | Inference-only Nemotron model compatible with HuggingFace weights. |
nemotron_h | Inference-only NemotronH model. |
nemotron_nas | Inference-only deci model compatible with HuggingFace weights. |
nemotron_vl | |
nvlm_d | |
olmo | Inference-only OLMo model compatible with HuggingFace weights. |
olmo2 | Inference-only OLMo2 model compatible with HuggingFace weights. |
olmoe | Inference-only OLMoE model compatible with HuggingFace weights. |
opt | Inference-only OPT model compatible with HuggingFace weights. |
orion | Inference-only Orion-14B model compatible with HuggingFace weights. |
ovis | PyTorch Ovis model. |
ovis2_5 | PyTorch Ovis model. |
paligemma | |
persimmon | Inference-only persimmon model compatible with HuggingFace weights. |
phi | Inference-only Phi-1.5 model compatible with HuggingFace weights. |
phi3 | Inference-only Phi3 model code inherit from Llama.py |
phi3v | |
phi4_multimodal | |
phi4flash | |
phi4mm | |
phi4mm_audio | |
phi4mm_utils | |
phimoe | Inference-only PhiMoE model. |
pixtral | |
plamo2 | Inference-only PLaMo2 model. |
prithvi_geospatial_mae | Inference-only IBM/NASA Prithvi Geospatial model. |
qwen | Inference-only QWen model compatible with HuggingFace weights. |
qwen2 | Inference-only Qwen2 model compatible with HuggingFace weights. |
qwen2_5_omni_thinker | Inference-only Qwen2.5-Omni model (thinker part). |
qwen2_5_vl | Inference-only Qwen2.5-VL model compatible with HuggingFace weights. |
qwen2_audio | Inference-only Qwen2-Audio model compatible with HuggingFace weights. |
qwen2_moe | Inference-only Qwen2MoE model compatible with HuggingFace weights. |
qwen2_rm | Inference-only Qwen2-RM model compatible with HuggingFace weights. |
qwen2_vl | Inference-only Qwen2-VL model compatible with HuggingFace weights. |
qwen3 | Inference-only Qwen3 model compatible with HuggingFace weights. |
qwen3_moe | Inference-only Qwen3MoE model compatible with HuggingFace weights. |
qwen_vl | Inference-only Qwen-VL model compatible with HuggingFace weights. |
registry | Whenever you add an architecture to this page, please also update |
roberta | |
rvl | |
seed_oss | Inference-only SeedOss model compatible with HuggingFace weights. |
siglip | Implementation of SiglipVisionModel intended to be only used |
siglip2navit | Implementation of SiglipVisionModel intended to be only used |
skyworkr1v | |
smolvlm | |
solar | Inference-only Solar model compatible with HuggingFace weights. |
stablelm | Inference-only StabeLM (https://github.com/Stability-AI/StableLM) |
starcoder2 | PyTorch Starcoder2 model. |
step3_text | Inference-only Jurassic model. |
step3_vl | |
swin | |
tarsier | |
telechat2 | |
teleflm | |
transformers | Wrapper around |
ultravox | PyTorch Ultravox model. |
utils | |
vision | |
voxtral | |
whisper | |
zamba2 | PyTorch Zamba2 model implementation for vLLM. |
ModelRegistry module-attribute
¶
ModelRegistry = _ModelRegistry(
{
model_arch: (
_LazyRegisteredModel(
module_name=f"vllm.model_executor.models.{mod_relname}",
class_name=cls_name,
)
)
for (model_arch, (mod_relname, cls_name)) in (
items()
)
}
)
__all__ module-attribute
¶
__all__ = [
"ModelRegistry",
"VllmModelForPooling",
"is_pooling_model",
"VllmModelForTextGeneration",
"is_text_generation_model",
"HasInnerState",
"has_inner_state",
"SupportsLoRA",
"supports_lora",
"SupportsMultiModal",
"supports_multimodal",
"SupportsPP",
"supports_pp",
"SupportsTranscription",
"supports_transcription",
"SupportsV0Only",
"supports_v0_only",
]
HasInnerState ¶
Bases: Protocol
The interface required for all models that has inner state.
Source code in vllm/model_executor/models/interfaces.py
SupportsLoRA ¶
Bases: Protocol
The interface required for all models that support LoRA.
Source code in vllm/model_executor/models/interfaces.py
SupportsMultiModal ¶
Bases: Protocol
The interface required for all multi-modal models.
Source code in vllm/model_executor/models/interfaces.py
supports_multimodal class-attribute
¶
supports_multimodal: Literal[True] = True
A flag that indicates this model supports multi-modal inputs.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
get_input_embeddings ¶
get_input_embeddings(
input_ids: Tensor,
multimodal_embeddings: Optional[
MultiModalEmbeddings
] = None,
attn_metadata: Optional[AttentionMetadata] = None,
) -> Tensor
get_input_embeddings(
input_ids: Tensor,
multimodal_embeddings: Optional[
MultiModalEmbeddings
] = None,
) -> Tensor
get_input_embeddings(
input_ids: Tensor,
multimodal_embeddings: Optional[
MultiModalEmbeddings
] = None,
attn_metadata: Optional[AttentionMetadata] = None,
) -> Tensor
Returns the input embeddings merged from the text embeddings from input_ids and the multimodal embeddings generated from multimodal kwargs.
Source code in vllm/model_executor/models/interfaces.py
get_language_model ¶
get_language_model() -> Module
Returns the underlying language model used for text generation.
This is typically the torch.nn.Module
instance responsible for processing the merged multimodal embeddings and producing hidden states
Returns:
Type | Description |
---|---|
Module | torch.nn.Module: The core language model component. |
Source code in vllm/model_executor/models/interfaces.py
get_multimodal_embeddings ¶
get_multimodal_embeddings(
**kwargs: object,
) -> MultiModalEmbeddings
Returns multimodal embeddings generated from multimodal kwargs to be merged with text embeddings.
Note
The returned multimodal embeddings must be in the same order as the appearances of their corresponding multimodal data item in the input prompt.
Source code in vllm/model_executor/models/interfaces.py
get_placeholder_str classmethod
¶
Get the placeholder text for the i
th modality
item in the prompt.
SupportsPP ¶
Bases: Protocol
The interface required for all models that support pipeline parallel.
Source code in vllm/model_executor/models/interfaces.py
supports_pp class-attribute
¶
supports_pp: Literal[True] = True
A flag that indicates this model supports pipeline parallel.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
forward ¶
forward(
*, intermediate_tensors: Optional[IntermediateTensors]
) -> Union[Tensor, IntermediateTensors]
Accept IntermediateTensors
when PP rank > 0.
Return IntermediateTensors
only for the last PP rank.
Source code in vllm/model_executor/models/interfaces.py
make_empty_intermediate_tensors ¶
make_empty_intermediate_tensors(
batch_size: int, dtype: dtype, device: device
) -> IntermediateTensors
Called when PP rank > 0 for profiling purposes.
SupportsTranscription ¶
Bases: Protocol
The interface required for all models that support transcription.
Source code in vllm/model_executor/models/interfaces.py
703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 |
|
supports_transcription_only class-attribute
¶
supports_transcription_only: bool = False
Transcription models can opt out of text generation by setting this to True
.
__init_subclass__ ¶
Source code in vllm/model_executor/models/interfaces.py
get_generation_prompt classmethod
¶
get_generation_prompt(
audio: ndarray,
stt_config: SpeechToTextConfig,
model_config: ModelConfig,
language: Optional[str],
task_type: str,
request_prompt: str,
) -> PromptType
Get the prompt for the ASR model. The model has control over the construction, as long as it returns a valid PromptType.
Source code in vllm/model_executor/models/interfaces.py
get_num_audio_tokens classmethod
¶
get_num_audio_tokens(
audio_duration_s: float,
stt_config: SpeechToTextConfig,
model_config: ModelConfig,
) -> Optional[int]
Map from audio duration to number of audio tokens produced by the ASR model, without running a forward pass. This is used for estimating the amount of processing for this audio.
Source code in vllm/model_executor/models/interfaces.py
get_other_languages classmethod
¶
get_speech_to_text_config classmethod
¶
get_speech_to_text_config(
model_config: ModelConfig,
task_type: Literal["transcribe", "translate"],
) -> SpeechToTextConfig
Get the speech to text config for the ASR model.
validate_language classmethod
¶
Ensure the language specified in the transcription request is a valid ISO 639-1 language code. If the request language is valid, but not natively supported by the model, trigger a warning (but not an exception).
Source code in vllm/model_executor/models/interfaces.py
SupportsV0Only ¶
Bases: Protocol
Models with this interface are not compatible with V1 vLLM.
Source code in vllm/model_executor/models/interfaces.py
VllmModelForPooling ¶
Bases: VllmModel[T_co]
, Protocol[T_co]
The interface required for all pooling models in vLLM.
Source code in vllm/model_executor/models/interfaces_base.py
VllmModelForTextGeneration ¶
has_inner_state ¶
has_inner_state(model: object) -> TypeIs[HasInnerState]
has_inner_state(
model: type[object],
) -> TypeIs[type[HasInnerState]]
has_inner_state(
model: Union[type[object], object],
) -> Union[
TypeIs[type[HasInnerState]], TypeIs[HasInnerState]
]
is_pooling_model ¶
is_pooling_model(
model: type[object],
) -> TypeIs[type[VllmModelForPooling]]
is_pooling_model(
model: object,
) -> TypeIs[VllmModelForPooling]
is_pooling_model(
model: Union[type[object], object],
) -> Union[
TypeIs[type[VllmModelForPooling]],
TypeIs[VllmModelForPooling],
]
Source code in vllm/model_executor/models/interfaces_base.py
is_text_generation_model ¶
is_text_generation_model(
model: type[object],
) -> TypeIs[type[VllmModelForTextGeneration]]
is_text_generation_model(
model: object,
) -> TypeIs[VllmModelForTextGeneration]
is_text_generation_model(
model: Union[type[object], object],
) -> Union[
TypeIs[type[VllmModelForTextGeneration]],
TypeIs[VllmModelForTextGeneration],
]
Source code in vllm/model_executor/models/interfaces_base.py
supports_lora ¶
supports_lora(
model: type[object],
) -> TypeIs[type[SupportsLoRA]]
supports_lora(model: object) -> TypeIs[SupportsLoRA]
supports_lora(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]
]
Source code in vllm/model_executor/models/interfaces.py
supports_multimodal ¶
supports_multimodal(
model: type[object],
) -> TypeIs[type[SupportsMultiModal]]
supports_multimodal(
model: object,
) -> TypeIs[SupportsMultiModal]
supports_multimodal(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsMultiModal]],
TypeIs[SupportsMultiModal],
]
supports_pp ¶
supports_pp(
model: type[object],
) -> TypeIs[type[SupportsPP]]
supports_pp(model: object) -> TypeIs[SupportsPP]
supports_pp(
model: Union[type[object], object],
) -> Union[
bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]
]
Source code in vllm/model_executor/models/interfaces.py
supports_transcription ¶
supports_transcription(
model: type[object],
) -> TypeIs[type[SupportsTranscription]]
supports_transcription(
model: object,
) -> TypeIs[SupportsTranscription]
supports_transcription(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsTranscription]],
TypeIs[SupportsTranscription],
]
supports_v0_only ¶
supports_v0_only(
model: type[object],
) -> TypeIs[type[SupportsV0Only]]
supports_v0_only(model: object) -> TypeIs[SupportsV0Only]
supports_v0_only(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]
]