Skip to content

vllm.model_executor.models

Modules:

Name Description
adapters
aimv2
arcee
arctic

Inference-only Snowflake Arctic model.

aria
aya_vision
baichuan

Inference-only BaiChuan model compatible with HuggingFace weights.

bailing_moe

Inference-only BailingMoE model compatible with HuggingFace weights.

bamba

Inference-only Bamba model.

bart

PyTorch BART model.

bert
bert_with_rope
blip

Minimal implementation of BlipVisionModel intended to be only used

blip2
bloom

Inference-only BLOOM model compatible with HuggingFace weights.

chameleon
chatglm

Inference-only ChatGLM model compatible with THUDM weights.

clip

Minimal implementation of CLIPVisionModel intended to be only used

cohere2_vision

Command-A-Vision (Cohere2Vision) multimodal model implementation for vLLM.

commandr

PyTorch Cohere model.

config
constant_size_cache
dbrx
deepseek

Inference-only Deepseek model.

deepseek_eagle
deepseek_mtp
deepseek_v2

Inference-only DeepseekV2/DeepseekV3 model.

deepseek_vl2

Inference-only Deepseek-VL2 model compatible with HuggingFace weights.

donut
dots1

Inference-only dots1 model.

ernie45

Inference-only Erine model compatible with HuggingFace weights.

ernie45_moe

Inference-only ErineMoE model compatible with HuggingFace weights.

ernie_mtp

Inference-only Ernie-MTP model.

exaone

Inference-only Exaone model compatible with HuggingFace weights.

exaone4

Inference-only Exaone model compatible with HuggingFace weights.

fairseq2_llama

Llama model for fairseq2 weights.

falcon

PyTorch Falcon model.

falcon_h1

Inference-only FalconH1 model.

florence2
fuyu

PyTorch Fuyu model.

gemma

Inference-only Gemma model compatible with HuggingFace weights.

gemma2
gemma3
gemma3_mm
gemma3n
gemma3n_mm
glm

Inference-only HF format GLM-4 model compatible with THUDM weights.

glm4

Inference-only GLM-4-0414 model compatible with HuggingFace weights.

glm4_1v

Inference-only GLM-4V model compatible with HuggingFace weights.

glm4_moe

Inference-only GLM-4.5 model compatible with HuggingFace weights.

glm4_moe_mtp

Inference-only GLM-4.5 MTP model compatible with HuggingFace weights.

glm4v

Inference-only CogAgent model compatible with THUDM weights.

gpt2

Inference-only GPT-2 model compatible with HuggingFace weights.

gpt_bigcode

Inference-only GPTBigCode model compatible with HuggingFace weights.

gpt_j

Inference-only GPT-J model compatible with HuggingFace weights.

gpt_neox

Inference-only GPT-NeoX model compatible with HuggingFace weights.

gpt_oss
granite

Inference-only IBM Granite model compatible with HuggingFace weights.

granite_speech

Inference-only IBM Granite speech model.

granitemoe

Inference-only GraniteMoe model.

granitemoehybrid

Inference-only GraniteMoeHybrid model.

granitemoeshared

Inference-only GraniteMoeShared model.

gritlm
grok1

Inference-only Grok1 model.

h2ovl
hunyuan_v1

Inference-only HunYuan model compatible with HuggingFace weights.

hyperclovax_vision
idefics2_vision_model

PyTorch Idefics2 model.

idefics3

Inference-only Idefics3 model compatible with HuggingFace weights.

interfaces
interfaces_base
intern_vit
internlm2
internlm2_ve
interns1
interns1_vit
internvl
jais

Inference-only Jais model compatible with HuggingFace weights.

jamba

Inference-only Jamba model.

jina_vl
keye
kimi_vl
lfm2
llama

Inference-only LLaMA model compatible with HuggingFace weights.

llama4

Inference-only LLaMA model compatible with HuggingFace weights.

llama4_eagle
llama_eagle
llama_eagle3
llava
llava_next
llava_next_video
llava_onevision
mamba

PyTorch MAMBA model.

mamba2

PyTorch MAMBA2 model.

mamba_cache
medusa
mimo

Inference-only MiMo model compatible with HuggingFace weights.

mimo_mtp

Inference-only MiMo-MTP model.

minicpm

Inference-only MiniCPM model compatible with HuggingFace weights.

minicpm3

Inference-only MiniCPM3 model compatible with HuggingFace weights.

minicpm_eagle

Inference-only EagleMiniCPM model compatible with HuggingFace weights.

minicpmo

Inference-only MiniCPM-O model compatible with HuggingFace weights.

minicpmv

Inference-only MiniCPM-V model compatible with HuggingFace weights.

minimax_cache
minimax_text_01

Inference-only MiniMaxText01 model.

minimax_vl_01
mistral3
mixtral

Inference-only Mixtral model.

mixtral_quant

Inference-only Mixtral model.

mllama

PyTorch Mllama model.

mllama4
mlp_speculator
modernbert
module_mapping
molmo
moonvit
mpt
nemotron

Inference-only Nemotron model compatible with HuggingFace weights.

nemotron_h

Inference-only NemotronH model.

nemotron_nas

Inference-only deci model compatible with HuggingFace weights.

nemotron_vl
nvlm_d
olmo

Inference-only OLMo model compatible with HuggingFace weights.

olmo2

Inference-only OLMo2 model compatible with HuggingFace weights.

olmoe

Inference-only OLMoE model compatible with HuggingFace weights.

opt

Inference-only OPT model compatible with HuggingFace weights.

orion

Inference-only Orion-14B model compatible with HuggingFace weights.

ovis

PyTorch Ovis model.

ovis2_5

PyTorch Ovis model.

paligemma
persimmon

Inference-only persimmon model compatible with HuggingFace weights.

phi

Inference-only Phi-1.5 model compatible with HuggingFace weights.

phi3

Inference-only Phi3 model code inherit from Llama.py

phi3v
phi4_multimodal
phi4flash
phi4mm
phi4mm_audio
phi4mm_utils
phimoe

Inference-only PhiMoE model.

pixtral
plamo2

Inference-only PLaMo2 model.

prithvi_geospatial_mae

Inference-only IBM/NASA Prithvi Geospatial model.

qwen

Inference-only QWen model compatible with HuggingFace weights.

qwen2

Inference-only Qwen2 model compatible with HuggingFace weights.

qwen2_5_omni_thinker

Inference-only Qwen2.5-Omni model (thinker part).

qwen2_5_vl

Inference-only Qwen2.5-VL model compatible with HuggingFace weights.

qwen2_audio

Inference-only Qwen2-Audio model compatible with HuggingFace weights.

qwen2_moe

Inference-only Qwen2MoE model compatible with HuggingFace weights.

qwen2_rm

Inference-only Qwen2-RM model compatible with HuggingFace weights.

qwen2_vl

Inference-only Qwen2-VL model compatible with HuggingFace weights.

qwen3

Inference-only Qwen3 model compatible with HuggingFace weights.

qwen3_moe

Inference-only Qwen3MoE model compatible with HuggingFace weights.

qwen_vl

Inference-only Qwen-VL model compatible with HuggingFace weights.

registry

Whenever you add an architecture to this page, please also update

roberta
rvl
seed_oss

Inference-only SeedOss model compatible with HuggingFace weights.

siglip

Implementation of SiglipVisionModel intended to be only used

siglip2navit

Implementation of SiglipVisionModel intended to be only used

skyworkr1v
smolvlm
solar

Inference-only Solar model compatible with HuggingFace weights.

stablelm

Inference-only StabeLM (https://github.com/Stability-AI/StableLM)

starcoder2

PyTorch Starcoder2 model.

step3_text

Inference-only Jurassic model.

step3_vl
swin
tarsier
telechat2
teleflm
transformers

Wrapper around transformers models

ultravox

PyTorch Ultravox model.

utils
vision
voxtral
whisper
zamba2

PyTorch Zamba2 model implementation for vLLM.

ModelRegistry module-attribute

ModelRegistry = _ModelRegistry(
    {
        model_arch: (
            _LazyRegisteredModel(
                module_name=f"vllm.model_executor.models.{mod_relname}",
                class_name=cls_name,
            )
        )
        for (model_arch, (mod_relname, cls_name)) in (
            items()
        )
    }
)

__all__ module-attribute

__all__ = [
    "ModelRegistry",
    "VllmModelForPooling",
    "is_pooling_model",
    "VllmModelForTextGeneration",
    "is_text_generation_model",
    "HasInnerState",
    "has_inner_state",
    "SupportsLoRA",
    "supports_lora",
    "SupportsMultiModal",
    "supports_multimodal",
    "SupportsPP",
    "supports_pp",
    "SupportsTranscription",
    "supports_transcription",
    "SupportsV0Only",
    "supports_v0_only",
]

HasInnerState

Bases: Protocol

The interface required for all models that has inner state.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class HasInnerState(Protocol):
    """The interface required for all models that has inner state."""

    has_inner_state: ClassVar[Literal[True]] = True
    """
        A flag that indicates this model has inner state.
        Models that has inner state usually need access to the scheduler_config
        for max_num_seqs, etc. True for e.g. both Mamba and Jamba.
    """

has_inner_state class-attribute

has_inner_state: Literal[True] = True

A flag that indicates this model has inner state. Models that has inner state usually need access to the scheduler_config for max_num_seqs, etc. True for e.g. both Mamba and Jamba.

SupportsLoRA

Bases: Protocol

The interface required for all models that support LoRA.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsLoRA(Protocol):
    """The interface required for all models that support LoRA."""

    supports_lora: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports LoRA.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """
    # The `embedding_module` and `embedding_padding_modules`
    # are empty by default.
    embedding_modules: ClassVar[dict[str, str]] = {}
    embedding_padding_modules: ClassVar[list[str]] = []
    packed_modules_mapping: ClassVar[dict[str, list[str]]] = {}

embedding_modules class-attribute

embedding_modules: dict[str, str] = {}

embedding_padding_modules class-attribute

embedding_padding_modules: list[str] = []

packed_modules_mapping class-attribute

packed_modules_mapping: dict[str, list[str]] = {}

supports_lora class-attribute

supports_lora: Literal[True] = True

A flag that indicates this model supports LoRA.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

SupportsMultiModal

Bases: Protocol

The interface required for all multi-modal models.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsMultiModal(Protocol):
    """The interface required for all multi-modal models."""

    supports_multimodal: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports multi-modal inputs.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    @classmethod
    def get_placeholder_str(cls, modality: str, i: int) -> Optional[str]:
        """
        Get the placeholder text for the `i`th `modality` item in the prompt.
        """
        ...

    def get_multimodal_embeddings(self,
                                  **kwargs: object) -> MultiModalEmbeddings:
        """
        Returns multimodal embeddings generated from multimodal kwargs 
        to be merged with text embeddings.

        Note:
            The returned multimodal embeddings must be in the same order as
            the appearances of their corresponding multimodal data item in the
            input prompt.
        """
        ...

    def get_language_model(self) -> torch.nn.Module:
        """
        Returns the underlying language model used for text generation.

        This is typically the `torch.nn.Module` instance responsible for 
        processing the merged multimodal embeddings and producing hidden states

        Returns:
            torch.nn.Module: The core language model component.
        """
        ...

    # Only for models that support v0 chunked prefill
    # TODO(ywang96): Remove this overload once v0 is deprecated
    @overload
    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
        attn_metadata: Optional["AttentionMetadata"] = None,
    ) -> Tensor:
        ...

    # TODO: Remove this overload once v0 is deprecated
    @overload
    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
    ) -> Tensor:
        ...

    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
        # Only necessary so that the v0 overload is valid
        # TODO: Remove attn_metadata once v0 is deprecated
        attn_metadata: Optional["AttentionMetadata"] = None,
    ) -> Tensor:
        """
        Returns the input embeddings merged from the text embeddings from 
        input_ids and the multimodal embeddings generated from multimodal 
        kwargs.
        """
        ...

supports_multimodal class-attribute

supports_multimodal: Literal[True] = True

A flag that indicates this model supports multi-modal inputs.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

get_input_embeddings

get_input_embeddings(
    input_ids: Tensor,
    multimodal_embeddings: Optional[
        MultiModalEmbeddings
    ] = None,
    attn_metadata: Optional[AttentionMetadata] = None,
) -> Tensor
get_input_embeddings(
    input_ids: Tensor,
    multimodal_embeddings: Optional[
        MultiModalEmbeddings
    ] = None,
) -> Tensor
get_input_embeddings(
    input_ids: Tensor,
    multimodal_embeddings: Optional[
        MultiModalEmbeddings
    ] = None,
    attn_metadata: Optional[AttentionMetadata] = None,
) -> Tensor

Returns the input embeddings merged from the text embeddings from input_ids and the multimodal embeddings generated from multimodal kwargs.

Source code in vllm/model_executor/models/interfaces.py
def get_input_embeddings(
    self,
    input_ids: Tensor,
    multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
    # Only necessary so that the v0 overload is valid
    # TODO: Remove attn_metadata once v0 is deprecated
    attn_metadata: Optional["AttentionMetadata"] = None,
) -> Tensor:
    """
    Returns the input embeddings merged from the text embeddings from 
    input_ids and the multimodal embeddings generated from multimodal 
    kwargs.
    """
    ...

get_language_model

get_language_model() -> Module

Returns the underlying language model used for text generation.

This is typically the torch.nn.Module instance responsible for processing the merged multimodal embeddings and producing hidden states

Returns:

Type Description
Module

torch.nn.Module: The core language model component.

Source code in vllm/model_executor/models/interfaces.py
def get_language_model(self) -> torch.nn.Module:
    """
    Returns the underlying language model used for text generation.

    This is typically the `torch.nn.Module` instance responsible for 
    processing the merged multimodal embeddings and producing hidden states

    Returns:
        torch.nn.Module: The core language model component.
    """
    ...

get_multimodal_embeddings

get_multimodal_embeddings(
    **kwargs: object,
) -> MultiModalEmbeddings

Returns multimodal embeddings generated from multimodal kwargs to be merged with text embeddings.

Note

The returned multimodal embeddings must be in the same order as the appearances of their corresponding multimodal data item in the input prompt.

Source code in vllm/model_executor/models/interfaces.py
def get_multimodal_embeddings(self,
                              **kwargs: object) -> MultiModalEmbeddings:
    """
    Returns multimodal embeddings generated from multimodal kwargs 
    to be merged with text embeddings.

    Note:
        The returned multimodal embeddings must be in the same order as
        the appearances of their corresponding multimodal data item in the
        input prompt.
    """
    ...

get_placeholder_str classmethod

get_placeholder_str(modality: str, i: int) -> Optional[str]

Get the placeholder text for the ith modality item in the prompt.

Source code in vllm/model_executor/models/interfaces.py
@classmethod
def get_placeholder_str(cls, modality: str, i: int) -> Optional[str]:
    """
    Get the placeholder text for the `i`th `modality` item in the prompt.
    """
    ...

SupportsPP

Bases: Protocol

The interface required for all models that support pipeline parallel.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsPP(Protocol):
    """The interface required for all models that support pipeline parallel."""

    supports_pp: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports pipeline parallel.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    def make_empty_intermediate_tensors(
        self,
        batch_size: int,
        dtype: torch.dtype,
        device: torch.device,
    ) -> "IntermediateTensors":
        """Called when PP rank > 0 for profiling purposes."""
        ...

    def forward(
        self,
        *,
        intermediate_tensors: Optional["IntermediateTensors"],
    ) -> Union[Tensor, "IntermediateTensors"]:
        """
        Accept [`IntermediateTensors`][vllm.sequence.IntermediateTensors] when
        PP rank > 0.

        Return [`IntermediateTensors`][vllm.sequence.IntermediateTensors] only
        for the last PP rank.
        """
        ...

supports_pp class-attribute

supports_pp: Literal[True] = True

A flag that indicates this model supports pipeline parallel.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

forward

forward(
    *, intermediate_tensors: Optional[IntermediateTensors]
) -> Union[Tensor, IntermediateTensors]

Accept IntermediateTensors when PP rank > 0.

Return IntermediateTensors only for the last PP rank.

Source code in vllm/model_executor/models/interfaces.py
def forward(
    self,
    *,
    intermediate_tensors: Optional["IntermediateTensors"],
) -> Union[Tensor, "IntermediateTensors"]:
    """
    Accept [`IntermediateTensors`][vllm.sequence.IntermediateTensors] when
    PP rank > 0.

    Return [`IntermediateTensors`][vllm.sequence.IntermediateTensors] only
    for the last PP rank.
    """
    ...

make_empty_intermediate_tensors

make_empty_intermediate_tensors(
    batch_size: int, dtype: dtype, device: device
) -> IntermediateTensors

Called when PP rank > 0 for profiling purposes.

Source code in vllm/model_executor/models/interfaces.py
def make_empty_intermediate_tensors(
    self,
    batch_size: int,
    dtype: torch.dtype,
    device: torch.device,
) -> "IntermediateTensors":
    """Called when PP rank > 0 for profiling purposes."""
    ...

SupportsTranscription

Bases: Protocol

The interface required for all models that support transcription.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsTranscription(Protocol):
    """The interface required for all models that support transcription."""
    # Mapping from ISO639_1 language codes: language names
    supported_languages: ClassVar[Mapping[str, str]]

    supports_transcription: ClassVar[Literal[True]] = True

    supports_transcription_only: ClassVar[bool] = False
    """
    Transcription models can opt out of text generation by setting this to
    `True`.
    """

    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        # language codes in supported_languages
        # that don't exist in the full language map
        invalid = set(cls.supported_languages) - set(LANGUAGES.keys())
        if invalid:
            raise ValueError(
                f"{cls.__name__}.supported_languages contains invalid "
                f"language codes: {sorted(invalid)}\n. "
                f"Valid choices are: {sorted(LANGUAGES.keys())}")

    @classmethod
    def get_generation_prompt(cls, audio: np.ndarray,
                              stt_config: SpeechToTextConfig,
                              model_config: ModelConfig,
                              language: Optional[str], task_type: str,
                              request_prompt: str) -> PromptType:
        """Get the prompt for the ASR model.
        The model has control over the construction, as long as it
        returns a valid PromptType."""
        ...

    @classmethod
    def get_other_languages(cls) -> Mapping[str, str]:
        # other possible language codes from the whisper map
        return {
            k: v
            for k, v in LANGUAGES.items() if k not in cls.supported_languages
        }

    @classmethod
    def validate_language(cls, language: Optional[str]) -> Optional[str]:
        """
        Ensure the language specified in the transcription request 
        is a valid ISO 639-1 language code. If the request language is 
        valid, but not natively supported by the model, trigger a 
        warning (but not an exception).
        """
        if language is None or language in cls.supported_languages:
            return language
        elif language in cls.get_other_languages():
            logger.warning(
                "Language %r is not natively supported by %s; "
                "results may be less accurate. Supported languages: %r",
                language,
                cls.__name__,
                list(cls.supported_languages.keys()),
            )
            return language
        else:
            raise ValueError(
                f"Unsupported language: {language!r}.  Must be one of "
                f"{list(cls.supported_languages.keys())}.")

    @classmethod
    def get_speech_to_text_config(
            cls, model_config: ModelConfig,
            task_type: Literal["transcribe",
                               "translate"]) -> SpeechToTextConfig:
        """Get the speech to text config for the ASR model."""
        ...

    @classmethod
    def get_num_audio_tokens(cls, audio_duration_s: float,
                             stt_config: SpeechToTextConfig,
                             model_config: ModelConfig) -> Optional[int]:
        """
        Map from audio duration to number of audio tokens produced by the ASR 
        model, without running a forward pass.
        This is used for estimating the amount of processing for this audio.
        """
        return None

supported_languages class-attribute

supported_languages: Mapping[str, str]

supports_transcription class-attribute

supports_transcription: Literal[True] = True

supports_transcription_only class-attribute

supports_transcription_only: bool = False

Transcription models can opt out of text generation by setting this to True.

__init_subclass__

__init_subclass__(**kwargs)
Source code in vllm/model_executor/models/interfaces.py
def __init_subclass__(cls, **kwargs):
    super().__init_subclass__(**kwargs)
    # language codes in supported_languages
    # that don't exist in the full language map
    invalid = set(cls.supported_languages) - set(LANGUAGES.keys())
    if invalid:
        raise ValueError(
            f"{cls.__name__}.supported_languages contains invalid "
            f"language codes: {sorted(invalid)}\n. "
            f"Valid choices are: {sorted(LANGUAGES.keys())}")

get_generation_prompt classmethod

get_generation_prompt(
    audio: ndarray,
    stt_config: SpeechToTextConfig,
    model_config: ModelConfig,
    language: Optional[str],
    task_type: str,
    request_prompt: str,
) -> PromptType

Get the prompt for the ASR model. The model has control over the construction, as long as it returns a valid PromptType.

Source code in vllm/model_executor/models/interfaces.py
@classmethod
def get_generation_prompt(cls, audio: np.ndarray,
                          stt_config: SpeechToTextConfig,
                          model_config: ModelConfig,
                          language: Optional[str], task_type: str,
                          request_prompt: str) -> PromptType:
    """Get the prompt for the ASR model.
    The model has control over the construction, as long as it
    returns a valid PromptType."""
    ...

get_num_audio_tokens classmethod

get_num_audio_tokens(
    audio_duration_s: float,
    stt_config: SpeechToTextConfig,
    model_config: ModelConfig,
) -> Optional[int]

Map from audio duration to number of audio tokens produced by the ASR model, without running a forward pass. This is used for estimating the amount of processing for this audio.

Source code in vllm/model_executor/models/interfaces.py
@classmethod
def get_num_audio_tokens(cls, audio_duration_s: float,
                         stt_config: SpeechToTextConfig,
                         model_config: ModelConfig) -> Optional[int]:
    """
    Map from audio duration to number of audio tokens produced by the ASR 
    model, without running a forward pass.
    This is used for estimating the amount of processing for this audio.
    """
    return None

get_other_languages classmethod

get_other_languages() -> Mapping[str, str]
Source code in vllm/model_executor/models/interfaces.py
@classmethod
def get_other_languages(cls) -> Mapping[str, str]:
    # other possible language codes from the whisper map
    return {
        k: v
        for k, v in LANGUAGES.items() if k not in cls.supported_languages
    }

get_speech_to_text_config classmethod

get_speech_to_text_config(
    model_config: ModelConfig,
    task_type: Literal["transcribe", "translate"],
) -> SpeechToTextConfig

Get the speech to text config for the ASR model.

Source code in vllm/model_executor/models/interfaces.py
@classmethod
def get_speech_to_text_config(
        cls, model_config: ModelConfig,
        task_type: Literal["transcribe",
                           "translate"]) -> SpeechToTextConfig:
    """Get the speech to text config for the ASR model."""
    ...

validate_language classmethod

validate_language(language: Optional[str]) -> Optional[str]

Ensure the language specified in the transcription request is a valid ISO 639-1 language code. If the request language is valid, but not natively supported by the model, trigger a warning (but not an exception).

Source code in vllm/model_executor/models/interfaces.py
@classmethod
def validate_language(cls, language: Optional[str]) -> Optional[str]:
    """
    Ensure the language specified in the transcription request 
    is a valid ISO 639-1 language code. If the request language is 
    valid, but not natively supported by the model, trigger a 
    warning (but not an exception).
    """
    if language is None or language in cls.supported_languages:
        return language
    elif language in cls.get_other_languages():
        logger.warning(
            "Language %r is not natively supported by %s; "
            "results may be less accurate. Supported languages: %r",
            language,
            cls.__name__,
            list(cls.supported_languages.keys()),
        )
        return language
    else:
        raise ValueError(
            f"Unsupported language: {language!r}.  Must be one of "
            f"{list(cls.supported_languages.keys())}.")

SupportsV0Only

Bases: Protocol

Models with this interface are not compatible with V1 vLLM.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsV0Only(Protocol):
    """Models with this interface are not compatible with V1 vLLM."""

    supports_v0_only: ClassVar[Literal[True]] = True

supports_v0_only class-attribute

supports_v0_only: Literal[True] = True

VllmModelForPooling

Bases: VllmModel[T_co], Protocol[T_co]

The interface required for all pooling models in vLLM.

Source code in vllm/model_executor/models/interfaces_base.py
@runtime_checkable
class VllmModelForPooling(VllmModel[T_co], Protocol[T_co]):
    """The interface required for all pooling models in vLLM."""

    is_pooling_model: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports pooling.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    pooler: Pooler
    """The pooler is only called on TP rank 0."""

is_pooling_model class-attribute

is_pooling_model: Literal[True] = True

A flag that indicates this model supports pooling.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

pooler instance-attribute

pooler: Pooler

The pooler is only called on TP rank 0.

VllmModelForTextGeneration

Bases: VllmModel[T], Protocol[T]

The interface required for all generative models in vLLM.

Source code in vllm/model_executor/models/interfaces_base.py
@runtime_checkable
class VllmModelForTextGeneration(VllmModel[T], Protocol[T]):
    """The interface required for all generative models in vLLM."""

    def compute_logits(
        self,
        hidden_states: T,
        sampling_metadata: SamplingMetadata,
    ) -> Optional[T]:
        """Return `None` if TP rank > 0."""
        ...

compute_logits

compute_logits(
    hidden_states: T, sampling_metadata: SamplingMetadata
) -> Optional[T]

Return None if TP rank > 0.

Source code in vllm/model_executor/models/interfaces_base.py
def compute_logits(
    self,
    hidden_states: T,
    sampling_metadata: SamplingMetadata,
) -> Optional[T]:
    """Return `None` if TP rank > 0."""
    ...

has_inner_state

has_inner_state(model: object) -> TypeIs[HasInnerState]
has_inner_state(
    model: type[object],
) -> TypeIs[type[HasInnerState]]
has_inner_state(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[HasInnerState]], TypeIs[HasInnerState]
]
Source code in vllm/model_executor/models/interfaces.py
def has_inner_state(
    model: Union[type[object], object]
) -> Union[TypeIs[type[HasInnerState]], TypeIs[HasInnerState]]:
    return getattr(model, "has_inner_state", False)

is_pooling_model

is_pooling_model(
    model: type[object],
) -> TypeIs[type[VllmModelForPooling]]
is_pooling_model(
    model: object,
) -> TypeIs[VllmModelForPooling]
Source code in vllm/model_executor/models/interfaces_base.py
def is_pooling_model(
    model: Union[type[object], object],
) -> Union[TypeIs[type[VllmModelForPooling]], TypeIs[VllmModelForPooling]]:
    if not is_vllm_model(model):
        return False

    return getattr(model, "is_pooling_model", False)

is_text_generation_model

is_text_generation_model(
    model: type[object],
) -> TypeIs[type[VllmModelForTextGeneration]]
is_text_generation_model(
    model: object,
) -> TypeIs[VllmModelForTextGeneration]
Source code in vllm/model_executor/models/interfaces_base.py
def is_text_generation_model(
    model: Union[type[object], object],
) -> Union[TypeIs[type[VllmModelForTextGeneration]],
           TypeIs[VllmModelForTextGeneration]]:
    if not is_vllm_model(model):
        return False

    if isinstance(model, type):
        return isinstance(model, VllmModelForTextGeneration)

    return isinstance(model, VllmModelForTextGeneration)

supports_lora

supports_lora(
    model: type[object],
) -> TypeIs[type[SupportsLoRA]]
supports_lora(model: object) -> TypeIs[SupportsLoRA]
supports_lora(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]
]
Source code in vllm/model_executor/models/interfaces.py
def supports_lora(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]]:
    result = _supports_lora(model)

    if not result:
        lora_attrs = (
            "packed_modules_mapping",
            "embedding_modules",
            "embedding_padding_modules",
        )
        missing_attrs = tuple(attr for attr in lora_attrs
                              if not hasattr(model, attr))

        if getattr(model, "supports_lora", False):
            if missing_attrs:
                logger.warning(
                    "The model (%s) sets `supports_lora=True`, "
                    "but is missing LoRA-specific attributes: %s",
                    model,
                    missing_attrs,
                )
        else:
            if not missing_attrs:
                logger.warning(
                    "The model (%s) contains all LoRA-specific attributes, "
                    "but does not set `supports_lora=True`.", model)

    return result

supports_multimodal

supports_multimodal(
    model: type[object],
) -> TypeIs[type[SupportsMultiModal]]
supports_multimodal(
    model: object,
) -> TypeIs[SupportsMultiModal]
supports_multimodal(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsMultiModal]],
    TypeIs[SupportsMultiModal],
]
Source code in vllm/model_executor/models/interfaces.py
def supports_multimodal(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsMultiModal]], TypeIs[SupportsMultiModal]]:
    return getattr(model, "supports_multimodal", False)

supports_pp

supports_pp(
    model: type[object],
) -> TypeIs[type[SupportsPP]]
supports_pp(model: object) -> TypeIs[SupportsPP]
supports_pp(
    model: Union[type[object], object],
) -> Union[
    bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]
]
Source code in vllm/model_executor/models/interfaces.py
def supports_pp(
    model: Union[type[object], object],
) -> Union[bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]]:
    supports_attributes = _supports_pp_attributes(model)
    supports_inspect = _supports_pp_inspect(model)

    if supports_attributes and not supports_inspect:
        logger.warning(
            "The model (%s) sets `supports_pp=True`, but does not accept "
            "`intermediate_tensors` in its `forward` method", model)

    if not supports_attributes:
        pp_attrs = ("make_empty_intermediate_tensors", )
        missing_attrs = tuple(attr for attr in pp_attrs
                              if not hasattr(model, attr))

        if getattr(model, "supports_pp", False):
            if missing_attrs:
                logger.warning(
                    "The model (%s) sets `supports_pp=True`, "
                    "but is missing PP-specific attributes: %s",
                    model,
                    missing_attrs,
                )
        else:
            if not missing_attrs:
                logger.warning(
                    "The model (%s) contains all PP-specific attributes, "
                    "but does not set `supports_pp=True`.", model)

    return supports_attributes and supports_inspect

supports_transcription

supports_transcription(
    model: type[object],
) -> TypeIs[type[SupportsTranscription]]
supports_transcription(
    model: object,
) -> TypeIs[SupportsTranscription]
supports_transcription(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsTranscription]],
    TypeIs[SupportsTranscription],
]
Source code in vllm/model_executor/models/interfaces.py
def supports_transcription(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsTranscription]], TypeIs[SupportsTranscription]]:
    return getattr(model, "supports_transcription", False)

supports_v0_only

supports_v0_only(
    model: type[object],
) -> TypeIs[type[SupportsV0Only]]
supports_v0_only(model: object) -> TypeIs[SupportsV0Only]
supports_v0_only(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]
]
Source code in vllm/model_executor/models/interfaces.py
def supports_v0_only(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]]:
    return getattr(model, "supports_v0_only", False)