vllm.model_executor.models

Modules:

Name	Description
`adapters`
`aimv2`
`arcee`
`arctic`	Inference-only Snowflake Arctic model.
`aria`
`aya_vision`
`baichuan`	Inference-only BaiChuan model compatible with HuggingFace weights.
`bailing_moe`	Inference-only BailingMoE model compatible with HuggingFace weights.
`bamba`	Inference-only Bamba model.
`bart`	PyTorch BART model.
`bert`
`bert_with_rope`
`blip`	Minimal implementation of BlipVisionModel intended to be only used
`blip2`
`bloom`	Inference-only BLOOM model compatible with HuggingFace weights.
`chameleon`
`chatglm`	Inference-only ChatGLM model compatible with THUDM weights.
`clip`	Minimal implementation of CLIPVisionModel intended to be only used
`cohere2_vision`	Command-A-Vision (Cohere2Vision) multimodal model implementation for vLLM.
`commandr`	PyTorch Cohere model.
`config`
`constant_size_cache`
`dbrx`
`deepseek`	Inference-only Deepseek model.
`deepseek_eagle`
`deepseek_mtp`
`deepseek_v2`	Inference-only DeepseekV2/DeepseekV3 model.
`deepseek_vl2`	Inference-only Deepseek-VL2 model compatible with HuggingFace weights.
`donut`
`dots1`	Inference-only dots1 model.
`ernie45`	Inference-only Erine model compatible with HuggingFace weights.
`ernie45_moe`	Inference-only ErineMoE model compatible with HuggingFace weights.
`ernie_mtp`	Inference-only Ernie-MTP model.
`exaone`	Inference-only Exaone model compatible with HuggingFace weights.
`exaone4`	Inference-only Exaone model compatible with HuggingFace weights.
`fairseq2_llama`	Llama model for fairseq2 weights.
`falcon`	PyTorch Falcon model.
`falcon_h1`	Inference-only FalconH1 model.
`florence2`
`fuyu`	PyTorch Fuyu model.
`gemma`	Inference-only Gemma model compatible with HuggingFace weights.
`gemma2`
`gemma3`
`gemma3_mm`
`gemma3n`
`gemma3n_mm`
`glm`	Inference-only HF format GLM-4 model compatible with THUDM weights.
`glm4`	Inference-only GLM-4-0414 model compatible with HuggingFace weights.
`glm4_1v`	Inference-only GLM-4V model compatible with HuggingFace weights.
`glm4_moe`	Inference-only GLM-4.5 model compatible with HuggingFace weights.
`glm4_moe_mtp`	Inference-only GLM-4.5 MTP model compatible with HuggingFace weights.
`glm4v`	Inference-only CogAgent model compatible with THUDM weights.
`gpt2`	Inference-only GPT-2 model compatible with HuggingFace weights.
`gpt_bigcode`	Inference-only GPTBigCode model compatible with HuggingFace weights.
`gpt_j`	Inference-only GPT-J model compatible with HuggingFace weights.
`gpt_neox`	Inference-only GPT-NeoX model compatible with HuggingFace weights.
`gpt_oss`
`granite`	Inference-only IBM Granite model compatible with HuggingFace weights.
`granite_speech`	Inference-only IBM Granite speech model.
`granitemoe`	Inference-only GraniteMoe model.
`granitemoehybrid`	Inference-only GraniteMoeHybrid model.
`granitemoeshared`	Inference-only GraniteMoeShared model.
`gritlm`
`grok1`	Inference-only Grok1 model.
`h2ovl`
`hunyuan_v1`	Inference-only HunYuan model compatible with HuggingFace weights.
`hyperclovax_vision`
`idefics2_vision_model`	PyTorch Idefics2 model.
`idefics3`	Inference-only Idefics3 model compatible with HuggingFace weights.
`interfaces`
`interfaces_base`
`intern_vit`
`internlm2`
`internlm2_ve`
`interns1`
`interns1_vit`
`internvl`
`jais`	Inference-only Jais model compatible with HuggingFace weights.
`jamba`	Inference-only Jamba model.
`jina_vl`
`keye`
`kimi_vl`
`lfm2`
`llama`	Inference-only LLaMA model compatible with HuggingFace weights.
`llama4`	Inference-only LLaMA model compatible with HuggingFace weights.
`llama4_eagle`
`llama_eagle`
`llama_eagle3`
`llava`
`llava_next`
`llava_next_video`
`llava_onevision`
`mamba`	PyTorch MAMBA model.
`mamba2`	PyTorch MAMBA2 model.
`mamba_cache`
`medusa`
`mimo`	Inference-only MiMo model compatible with HuggingFace weights.
`mimo_mtp`	Inference-only MiMo-MTP model.
`minicpm`	Inference-only MiniCPM model compatible with HuggingFace weights.
`minicpm3`	Inference-only MiniCPM3 model compatible with HuggingFace weights.
`minicpm_eagle`	Inference-only EagleMiniCPM model compatible with HuggingFace weights.
`minicpmo`	Inference-only MiniCPM-O model compatible with HuggingFace weights.
`minicpmv`	Inference-only MiniCPM-V model compatible with HuggingFace weights.
`minimax_cache`
`minimax_text_01`	Inference-only MiniMaxText01 model.
`minimax_vl_01`
`mistral3`
`mixtral`	Inference-only Mixtral model.
`mixtral_quant`	Inference-only Mixtral model.
`mllama`	PyTorch Mllama model.
`mllama4`
`mlp_speculator`
`modernbert`
`module_mapping`
`molmo`
`moonvit`
`mpt`
`nemotron`	Inference-only Nemotron model compatible with HuggingFace weights.
`nemotron_h`	Inference-only NemotronH model.
`nemotron_nas`	Inference-only deci model compatible with HuggingFace weights.
`nemotron_vl`
`nvlm_d`
`olmo`	Inference-only OLMo model compatible with HuggingFace weights.
`olmo2`	Inference-only OLMo2 model compatible with HuggingFace weights.
`olmoe`	Inference-only OLMoE model compatible with HuggingFace weights.
`opt`	Inference-only OPT model compatible with HuggingFace weights.
`orion`	Inference-only Orion-14B model compatible with HuggingFace weights.
`ovis`	PyTorch Ovis model.
`ovis2_5`	PyTorch Ovis model.
`paligemma`
`persimmon`	Inference-only persimmon model compatible with HuggingFace weights.
`phi`	Inference-only Phi-1.5 model compatible with HuggingFace weights.
`phi3`	Inference-only Phi3 model code inherit from Llama.py
`phi3v`
`phi4_multimodal`
`phi4flash`
`phi4mm`
`phi4mm_audio`
`phi4mm_utils`
`phimoe`	Inference-only PhiMoE model.
`pixtral`
`plamo2`	Inference-only PLaMo2 model.
`prithvi_geospatial_mae`	Inference-only IBM/NASA Prithvi Geospatial model.
`qwen`	Inference-only QWen model compatible with HuggingFace weights.
`qwen2`	Inference-only Qwen2 model compatible with HuggingFace weights.
`qwen2_5_omni_thinker`	Inference-only Qwen2.5-Omni model (thinker part).
`qwen2_5_vl`	Inference-only Qwen2.5-VL model compatible with HuggingFace weights.
`qwen2_audio`	Inference-only Qwen2-Audio model compatible with HuggingFace weights.
`qwen2_moe`	Inference-only Qwen2MoE model compatible with HuggingFace weights.
`qwen2_rm`	Inference-only Qwen2-RM model compatible with HuggingFace weights.
`qwen2_vl`	Inference-only Qwen2-VL model compatible with HuggingFace weights.
`qwen3`	Inference-only Qwen3 model compatible with HuggingFace weights.
`qwen3_moe`	Inference-only Qwen3MoE model compatible with HuggingFace weights.
`qwen_vl`	Inference-only Qwen-VL model compatible with HuggingFace weights.
`registry`	Whenever you add an architecture to this page, please also update
`roberta`
`rvl`
`seed_oss`	Inference-only SeedOss model compatible with HuggingFace weights.
`siglip`	Implementation of SiglipVisionModel intended to be only used
`siglip2navit`	Implementation of SiglipVisionModel intended to be only used
`skyworkr1v`
`smolvlm`
`solar`	Inference-only Solar model compatible with HuggingFace weights.
`stablelm`	Inference-only StabeLM (https://github.com/Stability-AI/StableLM)
`starcoder2`	PyTorch Starcoder2 model.
`step3_text`	Inference-only Jurassic model.
`step3_vl`
`swin`
`tarsier`
`telechat2`
`teleflm`
`transformers`	Wrapper around `transformers` models
`ultravox`	PyTorch Ultravox model.
`utils`
`vision`
`voxtral`
`whisper`
`zamba2`	PyTorch Zamba2 model implementation for vLLM.

ModelRegistry `module-attribute` ¶

ModelRegistry = _ModelRegistry(
    {
        model_arch: (
            _LazyRegisteredModel(
                module_name=f"vllm.model_executor.models.{mod_relname}",
                class_name=cls_name,
            )
        )
        for (model_arch, (mod_relname, cls_name)) in (
            items()
        )
    }
)

all `module-attribute` ¶

__all__ = [
    "ModelRegistry",
    "VllmModelForPooling",
    "is_pooling_model",
    "VllmModelForTextGeneration",
    "is_text_generation_model",
    "HasInnerState",
    "has_inner_state",
    "SupportsLoRA",
    "supports_lora",
    "SupportsMultiModal",
    "supports_multimodal",
    "SupportsPP",
    "supports_pp",
    "SupportsTranscription",
    "supports_transcription",
    "SupportsV0Only",
    "supports_v0_only",
]

HasInnerState ¶

Bases: Protocol

The interface required for all models that has inner state.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class HasInnerState(Protocol):
    """The interface required for all models that has inner state."""

    has_inner_state: ClassVar[Literal[True]] = True
    """
        A flag that indicates this model has inner state.
        Models that has inner state usually need access to the scheduler_config
        for max_num_seqs, etc. True for e.g. both Mamba and Jamba.
    """

has_inner_state `class-attribute` ¶

has_inner_state: Literal[True] = True

A flag that indicates this model has inner state. Models that has inner state usually need access to the scheduler_config for max_num_seqs, etc. True for e.g. both Mamba and Jamba.

SupportsLoRA ¶

Bases: Protocol

The interface required for all models that support LoRA.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsLoRA(Protocol):
    """The interface required for all models that support LoRA."""

    supports_lora: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports LoRA.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """
    # The `embedding_module` and `embedding_padding_modules`
    # are empty by default.
    embedding_modules: ClassVar[dict[str, str]] = {}
    embedding_padding_modules: ClassVar[list[str]] = []
    packed_modules_mapping: ClassVar[dict[str, list[str]]] = {}

embedding_modules `class-attribute` ¶

embedding_modules: dict[str, str] = {}

embedding_padding_modules `class-attribute` ¶

embedding_padding_modules: list[str] = []

packed_modules_mapping `class-attribute` ¶

packed_modules_mapping: dict[str, list[str]] = {}

supports_lora `class-attribute` ¶

supports_lora: Literal[True] = True

A flag that indicates this model supports LoRA.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

SupportsMultiModal ¶

Bases: Protocol

The interface required for all multi-modal models.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsMultiModal(Protocol):
    """The interface required for all multi-modal models."""

    supports_multimodal: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports multi-modal inputs.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    @classmethod
    def get_placeholder_str(cls, modality: str, i: int) -> Optional[str]:
        """
        Get the placeholder text for the `i`th `modality` item in the prompt.
        """
        ...

    def get_multimodal_embeddings(self,
                                  **kwargs: object) -> MultiModalEmbeddings:
        """
        Returns multimodal embeddings generated from multimodal kwargs 
        to be merged with text embeddings.

        Note:
            The returned multimodal embeddings must be in the same order as
            the appearances of their corresponding multimodal data item in the
            input prompt.
        """
        ...

    def get_language_model(self) -> torch.nn.Module:
        """
        Returns the underlying language model used for text generation.

        This is typically the `torch.nn.Module` instance responsible for 
        processing the merged multimodal embeddings and producing hidden states

        Returns:
            torch.nn.Module: The core language model component.
        """
        ...

    # Only for models that support v0 chunked prefill
    # TODO(ywang96): Remove this overload once v0 is deprecated
    @overload
    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
        attn_metadata: Optional["AttentionMetadata"] = None,
    ) -> Tensor:
        ...

    # TODO: Remove this overload once v0 is deprecated
    @overload
    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
    ) -> Tensor:
        ...

    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
        # Only necessary so that the v0 overload is valid
        # TODO: Remove attn_metadata once v0 is deprecated
        attn_metadata: Optional["AttentionMetadata"] = None,
    ) -> Tensor:
        """
        Returns the input embeddings merged from the text embeddings from 
        input_ids and the multimodal embeddings generated from multimodal 
        kwargs.
        """
        ...

supports_multimodal `class-attribute` ¶

supports_multimodal: Literal[True] = True

A flag that indicates this model supports multi-modal inputs.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

get_input_embeddings ¶

get_input_embeddings(
    input_ids: Tensor,
    multimodal_embeddings: Optional[
        MultiModalEmbeddings
    ] = None,
    attn_metadata: Optional[AttentionMetadata] = None,
) -> Tensor

get_input_embeddings(
    input_ids: Tensor,
    multimodal_embeddings: Optional[
        MultiModalEmbeddings
    ] = None,
) -> Tensor

get_input_embeddings(
    input_ids: Tensor,
    multimodal_embeddings: Optional[
        MultiModalEmbeddings
    ] = None,
    attn_metadata: Optional[AttentionMetadata] = None,
) -> Tensor

Returns the input embeddings merged from the text embeddings from input_ids and the multimodal embeddings generated from multimodal kwargs.

Source code in vllm/model_executor/models/interfaces.py

def get_input_embeddings(
    self,
    input_ids: Tensor,
    multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
    # Only necessary so that the v0 overload is valid
    # TODO: Remove attn_metadata once v0 is deprecated
    attn_metadata: Optional["AttentionMetadata"] = None,
) -> Tensor:
    """
    Returns the input embeddings merged from the text embeddings from 
    input_ids and the multimodal embeddings generated from multimodal 
    kwargs.
    """
    ...

get_language_model ¶

get_language_model() -> Module

Returns the underlying language model used for text generation.

This is typically the torch.nn.Module instance responsible for processing the merged multimodal embeddings and producing hidden states

Returns:

Type	Description
`Module`	torch.nn.Module: The core language model component.

Source code in vllm/model_executor/models/interfaces.py

def get_language_model(self) -> torch.nn.Module:
    """
    Returns the underlying language model used for text generation.

    This is typically the `torch.nn.Module` instance responsible for 
    processing the merged multimodal embeddings and producing hidden states

    Returns:
        torch.nn.Module: The core language model component.
    """
    ...

get_multimodal_embeddings ¶

get_multimodal_embeddings(
    **kwargs: object,
) -> MultiModalEmbeddings

Returns multimodal embeddings generated from multimodal kwargs to be merged with text embeddings.

Note

The returned multimodal embeddings must be in the same order as the appearances of their corresponding multimodal data item in the input prompt.

Source code in vllm/model_executor/models/interfaces.py

def get_multimodal_embeddings(self,
                              **kwargs: object) -> MultiModalEmbeddings:
    """
    Returns multimodal embeddings generated from multimodal kwargs 
    to be merged with text embeddings.

    Note:
        The returned multimodal embeddings must be in the same order as
        the appearances of their corresponding multimodal data item in the
        input prompt.
    """
    ...

get_placeholder_str `classmethod` ¶

get_placeholder_str(modality: str, i: int) -> Optional[str]

Get the placeholder text for the ith modality item in the prompt.

Source code in vllm/model_executor/models/interfaces.py

@classmethod
def get_placeholder_str(cls, modality: str, i: int) -> Optional[str]:
    """
    Get the placeholder text for the `i`th `modality` item in the prompt.
    """
    ...

SupportsPP ¶

Bases: Protocol

The interface required for all models that support pipeline parallel.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsPP(Protocol):
    """The interface required for all models that support pipeline parallel."""

    supports_pp: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports pipeline parallel.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    def make_empty_intermediate_tensors(
        self,
        batch_size: int,
        dtype: torch.dtype,
        device: torch.device,
    ) -> "IntermediateTensors":
        """Called when PP rank > 0 for profiling purposes."""
        ...

    def forward(
        self,
        *,
        intermediate_tensors: Optional["IntermediateTensors"],
    ) -> Union[Tensor, "IntermediateTensors"]:
        """
        Accept [`IntermediateTensors`][vllm.sequence.IntermediateTensors] when
        PP rank > 0.

        Return [`IntermediateTensors`][vllm.sequence.IntermediateTensors] only
        for the last PP rank.
        """
        ...

supports_pp `class-attribute` ¶

supports_pp: Literal[True] = True

A flag that indicates this model supports pipeline parallel.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

forward ¶

forward(
    *, intermediate_tensors: Optional[IntermediateTensors]
) -> Union[Tensor, IntermediateTensors]

Accept IntermediateTensors when PP rank > 0.

Return IntermediateTensors only for the last PP rank.

Source code in vllm/model_executor/models/interfaces.py

def forward(
    self,
    *,
    intermediate_tensors: Optional["IntermediateTensors"],
) -> Union[Tensor, "IntermediateTensors"]:
    """
    Accept [`IntermediateTensors`][vllm.sequence.IntermediateTensors] when
    PP rank > 0.

    Return [`IntermediateTensors`][vllm.sequence.IntermediateTensors] only
    for the last PP rank.
    """
    ...

make_empty_intermediate_tensors ¶

make_empty_intermediate_tensors(
    batch_size: int, dtype: dtype, device: device
) -> IntermediateTensors

Called when PP rank > 0 for profiling purposes.

Source code in vllm/model_executor/models/interfaces.py

def make_empty_intermediate_tensors(
    self,
    batch_size: int,
    dtype: torch.dtype,
    device: torch.device,
) -> "IntermediateTensors":
    """Called when PP rank > 0 for profiling purposes."""
    ...

SupportsTranscription ¶

Bases: Protocol

The interface required for all models that support transcription.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsTranscription(Protocol):
    """The interface required for all models that support transcription."""
    # Mapping from ISO639_1 language codes: language names
    supported_languages: ClassVar[Mapping[str, str]]

    supports_transcription: ClassVar[Literal[True]] = True

    supports_transcription_only: ClassVar[bool] = False
    """
    Transcription models can opt out of text generation by setting this to
    `True`.
    """

    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        # language codes in supported_languages
        # that don't exist in the full language map
        invalid = set(cls.supported_languages) - set(LANGUAGES.keys())
        if invalid:
            raise ValueError(
                f"{cls.__name__}.supported_languages contains invalid "
                f"language codes: {sorted(invalid)}\n. "
                f"Valid choices are: {sorted(LANGUAGES.keys())}")

    @classmethod
    def get_generation_prompt(cls, audio: np.ndarray,
                              stt_config: SpeechToTextConfig,
                              model_config: ModelConfig,
                              language: Optional[str], task_type: str,
                              request_prompt: str) -> PromptType:
        """Get the prompt for the ASR model.
        The model has control over the construction, as long as it
        returns a valid PromptType."""
        ...

    @classmethod
    def get_other_languages(cls) -> Mapping[str, str]:
        # other possible language codes from the whisper map
        return {
            k: v
            for k, v in LANGUAGES.items() if k not in cls.supported_languages
        }

    @classmethod
    def validate_language(cls, language: Optional[str]) -> Optional[str]:
        """
        Ensure the language specified in the transcription request 
        is a valid ISO 639-1 language code. If the request language is 
        valid, but not natively supported by the model, trigger a 
        warning (but not an exception).
        """
        if language is None or language in cls.supported_languages:
            return language
        elif language in cls.get_other_languages():
            logger.warning(
                "Language %r is not natively supported by %s; "
                "results may be less accurate. Supported languages: %r",
                language,
                cls.__name__,
                list(cls.supported_languages.keys()),
            )
            return language
        else:
            raise ValueError(
                f"Unsupported language: {language!r}.  Must be one of "
                f"{list(cls.supported_languages.keys())}.")

    @classmethod
    def get_speech_to_text_config(
            cls, model_config: ModelConfig,
            task_type: Literal["transcribe",
                               "translate"]) -> SpeechToTextConfig:
        """Get the speech to text config for the ASR model."""
        ...

    @classmethod
    def get_num_audio_tokens(cls, audio_duration_s: float,
                             stt_config: SpeechToTextConfig,
                             model_config: ModelConfig) -> Optional[int]:
        """
        Map from audio duration to number of audio tokens produced by the ASR 
        model, without running a forward pass.
        This is used for estimating the amount of processing for this audio.
        """
        return None

supported_languages `class-attribute` ¶

supported_languages: Mapping[str, str]

supports_transcription `class-attribute` ¶

supports_transcription: Literal[True] = True

supports_transcription_only `class-attribute` ¶

supports_transcription_only: bool = False

Transcription models can opt out of text generation by setting this to True.

__init_subclass__ ¶

__init_subclass__(**kwargs)

Source code in vllm/model_executor/models/interfaces.py

def __init_subclass__(cls, **kwargs):
    super().__init_subclass__(**kwargs)
    # language codes in supported_languages
    # that don't exist in the full language map
    invalid = set(cls.supported_languages) - set(LANGUAGES.keys())
    if invalid:
        raise ValueError(
            f"{cls.__name__}.supported_languages contains invalid "
            f"language codes: {sorted(invalid)}\n. "
            f"Valid choices are: {sorted(LANGUAGES.keys())}")

get_generation_prompt `classmethod` ¶

get_generation_prompt(
    audio: ndarray,
    stt_config: SpeechToTextConfig,
    model_config: ModelConfig,
    language: Optional[str],
    task_type: str,
    request_prompt: str,
) -> PromptType

Get the prompt for the ASR model. The model has control over the construction, as long as it returns a valid PromptType.

Source code in vllm/model_executor/models/interfaces.py

@classmethod
def get_generation_prompt(cls, audio: np.ndarray,
                          stt_config: SpeechToTextConfig,
                          model_config: ModelConfig,
                          language: Optional[str], task_type: str,
                          request_prompt: str) -> PromptType:
    """Get the prompt for the ASR model.
    The model has control over the construction, as long as it
    returns a valid PromptType."""
    ...

get_num_audio_tokens `classmethod` ¶

get_num_audio_tokens(
    audio_duration_s: float,
    stt_config: SpeechToTextConfig,
    model_config: ModelConfig,
) -> Optional[int]

Map from audio duration to number of audio tokens produced by the ASR model, without running a forward pass. This is used for estimating the amount of processing for this audio.

Source code in vllm/model_executor/models/interfaces.py

@classmethod
def get_num_audio_tokens(cls, audio_duration_s: float,
                         stt_config: SpeechToTextConfig,
                         model_config: ModelConfig) -> Optional[int]:
    """
    Map from audio duration to number of audio tokens produced by the ASR 
    model, without running a forward pass.
    This is used for estimating the amount of processing for this audio.
    """
    return None

get_other_languages `classmethod` ¶

get_other_languages() -> Mapping[str, str]

Source code in vllm/model_executor/models/interfaces.py

@classmethod
def get_other_languages(cls) -> Mapping[str, str]:
    # other possible language codes from the whisper map
    return {
        k: v
        for k, v in LANGUAGES.items() if k not in cls.supported_languages
    }

get_speech_to_text_config `classmethod` ¶

get_speech_to_text_config(
    model_config: ModelConfig,
    task_type: Literal["transcribe", "translate"],
) -> SpeechToTextConfig

Get the speech to text config for the ASR model.

Source code in vllm/model_executor/models/interfaces.py

@classmethod
def get_speech_to_text_config(
        cls, model_config: ModelConfig,
        task_type: Literal["transcribe",
                           "translate"]) -> SpeechToTextConfig:
    """Get the speech to text config for the ASR model."""
    ...

validate_language `classmethod` ¶

validate_language(language: Optional[str]) -> Optional[str]

Ensure the language specified in the transcription request is a valid ISO 639-1 language code. If the request language is valid, but not natively supported by the model, trigger a warning (but not an exception).

Source code in vllm/model_executor/models/interfaces.py

@classmethod
def validate_language(cls, language: Optional[str]) -> Optional[str]:
    """
    Ensure the language specified in the transcription request 
    is a valid ISO 639-1 language code. If the request language is 
    valid, but not natively supported by the model, trigger a 
    warning (but not an exception).
    """
    if language is None or language in cls.supported_languages:
        return language
    elif language in cls.get_other_languages():
        logger.warning(
            "Language %r is not natively supported by %s; "
            "results may be less accurate. Supported languages: %r",
            language,
            cls.__name__,
            list(cls.supported_languages.keys()),
        )
        return language
    else:
        raise ValueError(
            f"Unsupported language: {language!r}.  Must be one of "
            f"{list(cls.supported_languages.keys())}.")

SupportsV0Only ¶

Bases: Protocol

Models with this interface are not compatible with V1 vLLM.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsV0Only(Protocol):
    """Models with this interface are not compatible with V1 vLLM."""

    supports_v0_only: ClassVar[Literal[True]] = True

supports_v0_only `class-attribute` ¶

supports_v0_only: Literal[True] = True

VllmModelForPooling ¶

Bases: VllmModel[T_co], Protocol[T_co]

The interface required for all pooling models in vLLM.

Source code in vllm/model_executor/models/interfaces_base.py

@runtime_checkable
class VllmModelForPooling(VllmModel[T_co], Protocol[T_co]):
    """The interface required for all pooling models in vLLM."""

    is_pooling_model: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports pooling.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    pooler: Pooler
    """The pooler is only called on TP rank 0."""

is_pooling_model `class-attribute` ¶

is_pooling_model: Literal[True] = True

A flag that indicates this model supports pooling.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

pooler `instance-attribute` ¶

pooler: Pooler

The pooler is only called on TP rank 0.

VllmModelForTextGeneration ¶

Bases: VllmModel[T], Protocol[T]

The interface required for all generative models in vLLM.

Source code in vllm/model_executor/models/interfaces_base.py

@runtime_checkable
class VllmModelForTextGeneration(VllmModel[T], Protocol[T]):
    """The interface required for all generative models in vLLM."""

    def compute_logits(
        self,
        hidden_states: T,
        sampling_metadata: SamplingMetadata,
    ) -> Optional[T]:
        """Return `None` if TP rank > 0."""
        ...

compute_logits ¶

compute_logits(
    hidden_states: T, sampling_metadata: SamplingMetadata
) -> Optional[T]

Return None if TP rank > 0.

Source code in vllm/model_executor/models/interfaces_base.py

def compute_logits(
    self,
    hidden_states: T,
    sampling_metadata: SamplingMetadata,
) -> Optional[T]:
    """Return `None` if TP rank > 0."""
    ...

has_inner_state ¶

has_inner_state(model: object) -> TypeIs[HasInnerState]

has_inner_state(
    model: type[object],
) -> TypeIs[type[HasInnerState]]

has_inner_state(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[HasInnerState]], TypeIs[HasInnerState]
]

Source code in vllm/model_executor/models/interfaces.py

def has_inner_state(
    model: Union[type[object], object]
) -> Union[TypeIs[type[HasInnerState]], TypeIs[HasInnerState]]:
    return getattr(model, "has_inner_state", False)

is_pooling_model ¶

is_pooling_model(
    model: type[object],
) -> TypeIs[type[VllmModelForPooling]]

is_pooling_model(
    model: object,
) -> TypeIs[VllmModelForPooling]

is_pooling_model(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[VllmModelForPooling]],
    TypeIs[VllmModelForPooling],
]

Source code in vllm/model_executor/models/interfaces_base.py

def is_pooling_model(
    model: Union[type[object], object],
) -> Union[TypeIs[type[VllmModelForPooling]], TypeIs[VllmModelForPooling]]:
    if not is_vllm_model(model):
        return False

    return getattr(model, "is_pooling_model", False)

is_text_generation_model ¶

is_text_generation_model(
    model: type[object],
) -> TypeIs[type[VllmModelForTextGeneration]]

is_text_generation_model(
    model: object,
) -> TypeIs[VllmModelForTextGeneration]

is_text_generation_model(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[VllmModelForTextGeneration]],
    TypeIs[VllmModelForTextGeneration],
]

Source code in vllm/model_executor/models/interfaces_base.py

def is_text_generation_model(
    model: Union[type[object], object],
) -> Union[TypeIs[type[VllmModelForTextGeneration]],
           TypeIs[VllmModelForTextGeneration]]:
    if not is_vllm_model(model):
        return False

    if isinstance(model, type):
        return isinstance(model, VllmModelForTextGeneration)

    return isinstance(model, VllmModelForTextGeneration)

supports_lora ¶

supports_lora(
    model: type[object],
) -> TypeIs[type[SupportsLoRA]]

supports_lora(model: object) -> TypeIs[SupportsLoRA]

supports_lora(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]
]

Source code in vllm/model_executor/models/interfaces.py

def supports_lora(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]]:
    result = _supports_lora(model)

    if not result:
        lora_attrs = (
            "packed_modules_mapping",
            "embedding_modules",
            "embedding_padding_modules",
        )
        missing_attrs = tuple(attr for attr in lora_attrs
                              if not hasattr(model, attr))

        if getattr(model, "supports_lora", False):
            if missing_attrs:
                logger.warning(
                    "The model (%s) sets `supports_lora=True`, "
                    "but is missing LoRA-specific attributes: %s",
                    model,
                    missing_attrs,
                )
        else:
            if not missing_attrs:
                logger.warning(
                    "The model (%s) contains all LoRA-specific attributes, "
                    "but does not set `supports_lora=True`.", model)

    return result

supports_multimodal ¶

supports_multimodal(
    model: type[object],
) -> TypeIs[type[SupportsMultiModal]]

supports_multimodal(
    model: object,
) -> TypeIs[SupportsMultiModal]

supports_multimodal(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsMultiModal]],
    TypeIs[SupportsMultiModal],
]

Source code in vllm/model_executor/models/interfaces.py

def supports_multimodal(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsMultiModal]], TypeIs[SupportsMultiModal]]:
    return getattr(model, "supports_multimodal", False)

supports_pp ¶

supports_pp(
    model: type[object],
) -> TypeIs[type[SupportsPP]]

supports_pp(model: object) -> TypeIs[SupportsPP]

supports_pp(
    model: Union[type[object], object],
) -> Union[
    bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]
]

Source code in vllm/model_executor/models/interfaces.py

def supports_pp(
    model: Union[type[object], object],
) -> Union[bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]]:
    supports_attributes = _supports_pp_attributes(model)
    supports_inspect = _supports_pp_inspect(model)

    if supports_attributes and not supports_inspect:
        logger.warning(
            "The model (%s) sets `supports_pp=True`, but does not accept "
            "`intermediate_tensors` in its `forward` method", model)

    if not supports_attributes:
        pp_attrs = ("make_empty_intermediate_tensors", )
        missing_attrs = tuple(attr for attr in pp_attrs
                              if not hasattr(model, attr))

        if getattr(model, "supports_pp", False):
            if missing_attrs:
                logger.warning(
                    "The model (%s) sets `supports_pp=True`, "
                    "but is missing PP-specific attributes: %s",
                    model,
                    missing_attrs,
                )
        else:
            if not missing_attrs:
                logger.warning(
                    "The model (%s) contains all PP-specific attributes, "
                    "but does not set `supports_pp=True`.", model)

    return supports_attributes and supports_inspect

supports_transcription ¶

supports_transcription(
    model: type[object],
) -> TypeIs[type[SupportsTranscription]]

supports_transcription(
    model: object,
) -> TypeIs[SupportsTranscription]

supports_transcription(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsTranscription]],
    TypeIs[SupportsTranscription],
]

Source code in vllm/model_executor/models/interfaces.py

def supports_transcription(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsTranscription]], TypeIs[SupportsTranscription]]:
    return getattr(model, "supports_transcription", False)

supports_v0_only ¶

supports_v0_only(
    model: type[object],
) -> TypeIs[type[SupportsV0Only]]

supports_v0_only(model: object) -> TypeIs[SupportsV0Only]

supports_v0_only(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]
]

Source code in vllm/model_executor/models/interfaces.py

def supports_v0_only(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]]:
    return getattr(model, "supports_v0_only", False)

vllm.model_executor.models

ModelRegistry module-attribute ¶

__all__ module-attribute ¶

HasInnerState ¶

has_inner_state class-attribute ¶

SupportsLoRA ¶

embedding_modules class-attribute ¶

embedding_padding_modules class-attribute ¶

packed_modules_mapping class-attribute ¶

supports_lora class-attribute ¶

SupportsMultiModal ¶

supports_multimodal class-attribute ¶

get_input_embeddings ¶

get_language_model ¶

get_multimodal_embeddings ¶

get_placeholder_str classmethod ¶

SupportsPP ¶

supports_pp class-attribute ¶

forward ¶

make_empty_intermediate_tensors ¶

SupportsTranscription ¶

supported_languages class-attribute ¶

supports_transcription class-attribute ¶

supports_transcription_only class-attribute ¶

__init_subclass__ ¶

get_generation_prompt classmethod ¶

get_num_audio_tokens classmethod ¶

get_other_languages classmethod ¶

get_speech_to_text_config classmethod ¶

validate_language classmethod ¶

SupportsV0Only ¶

supports_v0_only class-attribute ¶

VllmModelForPooling ¶

is_pooling_model class-attribute ¶

pooler instance-attribute ¶

VllmModelForTextGeneration ¶

compute_logits ¶

has_inner_state ¶

is_pooling_model ¶

is_text_generation_model ¶

supports_lora ¶

supports_multimodal ¶

supports_pp ¶

supports_transcription ¶

supports_v0_only ¶

ModelRegistry `module-attribute` ¶

all `module-attribute` ¶

has_inner_state `class-attribute` ¶

embedding_modules `class-attribute` ¶

embedding_padding_modules `class-attribute` ¶

packed_modules_mapping `class-attribute` ¶

supports_lora `class-attribute` ¶

supports_multimodal `class-attribute` ¶

get_placeholder_str `classmethod` ¶

supports_pp `class-attribute` ¶

supported_languages `class-attribute` ¶

supports_transcription `class-attribute` ¶

supports_transcription_only `class-attribute` ¶

get_generation_prompt `classmethod` ¶

get_num_audio_tokens `classmethod` ¶

get_other_languages `classmethod` ¶

get_speech_to_text_config `classmethod` ¶

validate_language `classmethod` ¶

supports_v0_only `class-attribute` ¶

is_pooling_model `class-attribute` ¶

pooler `instance-attribute` ¶