vllm.inputs

Modules:

Name	Description
`data`
`parse`
`preprocess`
`registry`

DecoderOnlyInputs `module-attribute` ¶

DecoderOnlyInputs = Union[
    TokenInputs, EmbedsInputs, "MultiModalInputs"
]

The inputs in LLMEngine before they are passed to the model executor. This specifies the data required for decoder-only models.

INPUT_REGISTRY `module-attribute` ¶

INPUT_REGISTRY = InputRegistry()

The global InputRegistry which is used by LLMEngine to dispatch data processing according to the target model.

ProcessorInputs `module-attribute` ¶

ProcessorInputs = Union[
    DecoderOnlyInputs, EncoderDecoderInputs
]

The outputs from vllm.inputs.preprocess.InputPreprocessor.

PromptType `module-attribute` ¶

PromptType = Union[
    SingletonPrompt, ExplicitEncoderDecoderPrompt
]

Set of possible schemas for an LLM input, including both decoder-only and encoder/decoder input types:

A text prompt (str or TextPrompt)
A tokenized prompt (TokensPrompt)
An embeddings prompt (EmbedsPrompt)
A single data structure containing both an encoder and a decoder prompt (ExplicitEncoderDecoderPrompt)

SingletonInputs `module-attribute` ¶

SingletonInputs = Union[
    TokenInputs, EmbedsInputs, "MultiModalInputs"
]

A processed SingletonPrompt which can be passed to vllm.sequence.Sequence.

SingletonPrompt `module-attribute` ¶

SingletonPrompt = Union[
    str, TextPrompt, TokensPrompt, EmbedsPrompt
]

Set of possible schemas for a single prompt:

A text prompt (str or TextPrompt)
A tokenized prompt (TokensPrompt)
An embeddings prompt (EmbedsPrompt)

Note that "singleton" is as opposed to a data structure which encapsulates multiple prompts, i.e. of the sort which may be utilized for encoder/decoder models when the user desires to express both the encoder & decoder prompts explicitly, i.e. ExplicitEncoderDecoderPrompt

A prompt of type SingletonPrompt may be employed as (1) input to a decoder-only model, (2) input to the encoder of an encoder/decoder model, in the scenario where the decoder-prompt is not specified explicitly, or (3) as a member of a larger data structure encapsulating more than one prompt, i.e. ExplicitEncoderDecoderPrompt

all `module-attribute` ¶

__all__ = [
    "TextPrompt",
    "TokensPrompt",
    "PromptType",
    "SingletonPrompt",
    "ExplicitEncoderDecoderPrompt",
    "TokenInputs",
    "EmbedsInputs",
    "EmbedsPrompt",
    "token_inputs",
    "embeds_inputs",
    "DecoderOnlyInputs",
    "EncoderDecoderInputs",
    "ProcessorInputs",
    "SingletonInputs",
    "build_explicit_enc_dec_prompt",
    "to_enc_dec_tuple_list",
    "zip_enc_dec_prompts",
    "INPUT_REGISTRY",
    "DummyData",
    "InputContext",
    "InputProcessingContext",
    "InputRegistry",
]

DummyData ¶

Bases: NamedTuple

Dummy data used for profiling.

Note: This is only used in V0.

Source code in vllm/inputs/registry.py

class DummyData(NamedTuple):
    """
    Dummy data used for profiling.

    Note: This is only used in V0.
    """

    seq_data: SequenceData
    multi_modal_data: Optional[MultiModalDataDict] = None
    multi_modal_placeholders: Optional[MultiModalPlaceholderDict] = None

multi_modal_data `class-attribute` `instance-attribute` ¶

multi_modal_data: Optional[MultiModalDataDict] = None

multi_modal_placeholders `class-attribute` `instance-attribute` ¶

multi_modal_placeholders: Optional[
    MultiModalPlaceholderDict
] = None

seq_data `instance-attribute` ¶

seq_data: SequenceData

EmbedsInputs ¶

Bases: TypedDict

Represents embeddings-based inputs.

Source code in vllm/inputs/data.py

class EmbedsInputs(TypedDict):
    """Represents embeddings-based inputs."""

    type: Literal["embeds"]
    """The type of inputs."""

    prompt_embeds: torch.Tensor
    """The embeddings of the prompt."""

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

Optional cache salt to be used for prefix caching.

prompt_embeds `instance-attribute` ¶

prompt_embeds: Tensor

The embeddings of the prompt.

type `instance-attribute` ¶

type: Literal['embeds']

The type of inputs.

EmbedsPrompt ¶

Bases: TypedDict

Schema for a prompt provided via token embeddings.

Source code in vllm/inputs/data.py

class EmbedsPrompt(TypedDict):
    """Schema for a prompt provided via token embeddings."""

    prompt_embeds: torch.Tensor
    """The embeddings of the prompt."""

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

Optional cache salt to be used for prefix caching.

prompt_embeds `instance-attribute` ¶

prompt_embeds: Tensor

The embeddings of the prompt.

EncoderDecoderInputs ¶

Bases: TypedDict

The inputs in LLMEngine before they are passed to the model executor.

This specifies the required data for encoder-decoder models.

Source code in vllm/inputs/data.py

class EncoderDecoderInputs(TypedDict):
    """
    The inputs in [`LLMEngine`][vllm.engine.llm_engine.LLMEngine] before they
    are passed to the model executor.

    This specifies the required data for encoder-decoder models.
    """

    encoder: Union[TokenInputs, "MultiModalInputs"]
    """The inputs for the encoder portion."""

    decoder: Union[TokenInputs, "MultiModalInputs"]
    """The inputs for the decoder portion."""

decoder `instance-attribute` ¶

decoder: Union[TokenInputs, MultiModalInputs]

The inputs for the decoder portion.

encoder `instance-attribute` ¶

encoder: Union[TokenInputs, MultiModalInputs]

The inputs for the encoder portion.

ExplicitEncoderDecoderPrompt ¶

Bases: TypedDict, Generic[_T1_co, _T2_co]

Represents an encoder/decoder model input prompt, comprising an explicit encoder prompt and a decoder prompt.

The encoder and decoder prompts, respectively, may be formatted according to any of the SingletonPrompt schemas, and are not required to have the same schema.

Only the encoder prompt may have multi-modal data. mm_processor_kwargs should be at the top-level, and should not be set in the encoder/decoder prompts, since they are agnostic to the encoder/decoder.

Note that an ExplicitEncoderDecoderPrompt may not be used as an input to a decoder-only model, and that the encoder_prompt and decoder_prompt fields of this data structure themselves must be SingletonPrompt instances.

Source code in vllm/inputs/data.py

class ExplicitEncoderDecoderPrompt(TypedDict, Generic[_T1_co, _T2_co]):
    """
    Represents an encoder/decoder model input prompt,
    comprising an explicit encoder prompt and a decoder prompt.

    The encoder and decoder prompts, respectively, may be formatted
    according to any of the
    [`SingletonPrompt`][vllm.inputs.data.SingletonPrompt] schemas,
    and are not required to have the same schema.

    Only the encoder prompt may have multi-modal data. mm_processor_kwargs
    should be at the top-level, and should not be set in the encoder/decoder
    prompts, since they are agnostic to the encoder/decoder.

    Note that an
    [`ExplicitEncoderDecoderPrompt`][vllm.inputs.data.ExplicitEncoderDecoderPrompt]
    may not be used as an input to a decoder-only model,
    and that the `encoder_prompt` and `decoder_prompt`
    fields of this data structure themselves must be
    [`SingletonPrompt`][vllm.inputs.data.SingletonPrompt] instances.
    """

    encoder_prompt: _T1_co

    decoder_prompt: Optional[_T2_co]

    mm_processor_kwargs: NotRequired[dict[str, Any]]

decoder_prompt `instance-attribute` ¶

decoder_prompt: Optional[_T2_co]

encoder_prompt `instance-attribute` ¶

encoder_prompt: _T1_co

mm_processor_kwargs `instance-attribute` ¶

mm_processor_kwargs: NotRequired[dict[str, Any]]

InputContext `dataclass` ¶

Contains information about the model which may be used to modify the inputs.

Source code in vllm/inputs/registry.py

@dataclass(frozen=True)
class InputContext:
    """
    Contains information about the model which may be used to
    modify the inputs.
    """

    model_config: ModelConfig
    """The configuration of the model."""

    def get_hf_config(
        self,
        typ: Union[type[_C], tuple[type[_C], ...]] = PretrainedConfig,
        /,
    ) -> _C:
        """
        Get the HuggingFace configuration
        (`transformers.PretrainedConfig`) of the model,
        additionally checking its type.

        Raises:
            TypeError: If the configuration is not of the specified type.
        """
        hf_config = self.model_config.hf_config
        if not isinstance(hf_config, typ):
            raise TypeError("Invalid type of HuggingFace config. "
                            f"Expected type: {typ}, but "
                            f"found type: {type(hf_config)}")

        return hf_config

    def get_hf_image_processor_config(self) -> dict[str, Any]:
        """
        Get the HuggingFace image processor configuration of the model.
        """
        return self.model_config.hf_image_processor_config

    def get_mm_config(self):
        """
        Get the multimodal config of the model.

        Raises:
            RuntimeError: If the model is not a multimodal model.
        """
        mm_config = self.model_config.multimodal_config
        if mm_config is None:
            raise RuntimeError("Not a multimodal model")

        return mm_config

    def get_hf_processor(
        self,
        typ: Union[type[_P], tuple[type[_P], ...]] = ProcessorMixin,
        /,
        **kwargs: object,
    ) -> _P:
        """
        Get the HuggingFace processor
        (`transformers.ProcessorMixin`) of the model,
        additionally checking its type.

        Raises:
            TypeError: If the processor is not of the specified type.
        """
        return cached_processor_from_config(
            self.model_config,
            processor_cls=typ,
            **kwargs,
        )

    def init_processor(
        self,
        typ: type[_T],
        /,
        **kwargs: object,
    ) -> _T:
        """
        Initialize a HuggingFace-like processor class, merging the
        keyword arguments with those in the model's configuration.
        """
        mm_config = self.model_config.get_multimodal_config()
        base_kwargs = mm_config.mm_processor_kwargs
        if base_kwargs is None:
            base_kwargs = {}

        merged_kwargs = {**base_kwargs, **kwargs}

        return typ(**merged_kwargs)

model_config `instance-attribute` ¶

model_config: ModelConfig

The configuration of the model.

init ¶

__init__(model_config: ModelConfig) -> None

get_hf_config ¶

get_hf_config(
    typ: Union[
        type[_C], tuple[type[_C], ...]
    ] = PretrainedConfig,
) -> _C

Get the HuggingFace configuration (transformers.PretrainedConfig) of the model, additionally checking its type.

Raises:

Type	Description
`TypeError`	If the configuration is not of the specified type.

Source code in vllm/inputs/registry.py

def get_hf_config(
    self,
    typ: Union[type[_C], tuple[type[_C], ...]] = PretrainedConfig,
    /,
) -> _C:
    """
    Get the HuggingFace configuration
    (`transformers.PretrainedConfig`) of the model,
    additionally checking its type.

    Raises:
        TypeError: If the configuration is not of the specified type.
    """
    hf_config = self.model_config.hf_config
    if not isinstance(hf_config, typ):
        raise TypeError("Invalid type of HuggingFace config. "
                        f"Expected type: {typ}, but "
                        f"found type: {type(hf_config)}")

    return hf_config

get_hf_image_processor_config ¶

get_hf_image_processor_config() -> dict[str, Any]

Get the HuggingFace image processor configuration of the model.

Source code in vllm/inputs/registry.py

def get_hf_image_processor_config(self) -> dict[str, Any]:
    """
    Get the HuggingFace image processor configuration of the model.
    """
    return self.model_config.hf_image_processor_config

get_hf_processor ¶

get_hf_processor(
    typ: Union[
        type[_P], tuple[type[_P], ...]
    ] = ProcessorMixin,
    /,
    **kwargs: object,
) -> _P

Get the HuggingFace processor (transformers.ProcessorMixin) of the model, additionally checking its type.

Raises:

Type	Description
`TypeError`	If the processor is not of the specified type.

Source code in vllm/inputs/registry.py

def get_hf_processor(
    self,
    typ: Union[type[_P], tuple[type[_P], ...]] = ProcessorMixin,
    /,
    **kwargs: object,
) -> _P:
    """
    Get the HuggingFace processor
    (`transformers.ProcessorMixin`) of the model,
    additionally checking its type.

    Raises:
        TypeError: If the processor is not of the specified type.
    """
    return cached_processor_from_config(
        self.model_config,
        processor_cls=typ,
        **kwargs,
    )

get_mm_config ¶

get_mm_config()

Get the multimodal config of the model.

Raises:

Type	Description
`RuntimeError`	If the model is not a multimodal model.

Source code in vllm/inputs/registry.py

def get_mm_config(self):
    """
    Get the multimodal config of the model.

    Raises:
        RuntimeError: If the model is not a multimodal model.
    """
    mm_config = self.model_config.multimodal_config
    if mm_config is None:
        raise RuntimeError("Not a multimodal model")

    return mm_config

init_processor ¶

init_processor(typ: type[_T], /, **kwargs: object) -> _T

Initialize a HuggingFace-like processor class, merging the keyword arguments with those in the model's configuration.

Source code in vllm/inputs/registry.py

def init_processor(
    self,
    typ: type[_T],
    /,
    **kwargs: object,
) -> _T:
    """
    Initialize a HuggingFace-like processor class, merging the
    keyword arguments with those in the model's configuration.
    """
    mm_config = self.model_config.get_multimodal_config()
    base_kwargs = mm_config.mm_processor_kwargs
    if base_kwargs is None:
        base_kwargs = {}

    merged_kwargs = {**base_kwargs, **kwargs}

    return typ(**merged_kwargs)

InputProcessingContext `dataclass` ¶

Bases: InputContext

Source code in vllm/inputs/registry.py

@dataclass(frozen=True)
class InputProcessingContext(InputContext):
    tokenizer: AnyTokenizer
    """The tokenizer used to tokenize the inputs."""

    def get_hf_processor(
        self,
        typ: Union[type[_P], tuple[type[_P], ...]] = ProcessorMixin,
        /,
        **kwargs: object,
    ) -> _P:
        return super().get_hf_processor(
            typ,
            tokenizer=self.tokenizer,
            **kwargs,
        )

    def call_hf_processor(
        self,
        hf_processor: ProcessorMixin,
        data: Mapping[str, object],
        kwargs: Mapping[str, object] = {},
    ) -> Union[BatchFeature, JSONTree]:
        """
        Call `hf_processor` on the prompt `data`
        (text, image, audio...) with configurable options `kwargs`.
        """
        assert callable(hf_processor)

        mm_config = self.model_config.get_multimodal_config()
        merged_kwargs = mm_config.merge_mm_processor_kwargs(kwargs)

        allowed_kwargs = get_allowed_kwarg_only_overrides(
            hf_processor,
            merged_kwargs,
            requires_kw_only=False,
            allow_var_kwargs=True,
        )

        def maybe_cast_dtype(x):
            # This mimics the behavior of transformers.BatchFeature
            if isinstance(x, torch.Tensor) and x.is_floating_point():
                return x.to(dtype=self.model_config.dtype)
            return x

        try:
            output = hf_processor(**data,
                                  **allowed_kwargs,
                                  return_tensors="pt")
            # this emulates output.to(dtype=self.model_config.dtype)
            if isinstance(output, BatchFeature):
                cast_output = json_map_leaves(maybe_cast_dtype, output.data)
                return BatchFeature(cast_output)

            cast_output = json_map_leaves(maybe_cast_dtype, output)

            logger.warning_once(
                f"{type(hf_processor).__name__} did not return `BatchFeature`. "
                "Make sure to match the behaviour of `ProcessorMixin` when "
                "implementing custom processors.")
            return cast_output

        except Exception as exc:
            msg = (f"Failed to apply {type(hf_processor).__name__} "
                   f"on data={data} with kwargs={allowed_kwargs}")

            raise ValueError(msg) from exc

tokenizer `instance-attribute` ¶

tokenizer: AnyTokenizer

The tokenizer used to tokenize the inputs.

init ¶

__init__(
    model_config: ModelConfig, tokenizer: AnyTokenizer
) -> None

call_hf_processor ¶

call_hf_processor(
    hf_processor: ProcessorMixin,
    data: Mapping[str, object],
    kwargs: Mapping[str, object] = {},
) -> Union[BatchFeature, JSONTree]

Call hf_processor on the prompt data (text, image, audio...) with configurable options kwargs.

Source code in vllm/inputs/registry.py

def call_hf_processor(
    self,
    hf_processor: ProcessorMixin,
    data: Mapping[str, object],
    kwargs: Mapping[str, object] = {},
) -> Union[BatchFeature, JSONTree]:
    """
    Call `hf_processor` on the prompt `data`
    (text, image, audio...) with configurable options `kwargs`.
    """
    assert callable(hf_processor)

    mm_config = self.model_config.get_multimodal_config()
    merged_kwargs = mm_config.merge_mm_processor_kwargs(kwargs)

    allowed_kwargs = get_allowed_kwarg_only_overrides(
        hf_processor,
        merged_kwargs,
        requires_kw_only=False,
        allow_var_kwargs=True,
    )

    def maybe_cast_dtype(x):
        # This mimics the behavior of transformers.BatchFeature
        if isinstance(x, torch.Tensor) and x.is_floating_point():
            return x.to(dtype=self.model_config.dtype)
        return x

    try:
        output = hf_processor(**data,
                              **allowed_kwargs,
                              return_tensors="pt")
        # this emulates output.to(dtype=self.model_config.dtype)
        if isinstance(output, BatchFeature):
            cast_output = json_map_leaves(maybe_cast_dtype, output.data)
            return BatchFeature(cast_output)

        cast_output = json_map_leaves(maybe_cast_dtype, output)

        logger.warning_once(
            f"{type(hf_processor).__name__} did not return `BatchFeature`. "
            "Make sure to match the behaviour of `ProcessorMixin` when "
            "implementing custom processors.")
        return cast_output

    except Exception as exc:
        msg = (f"Failed to apply {type(hf_processor).__name__} "
               f"on data={data} with kwargs={allowed_kwargs}")

        raise ValueError(msg) from exc

get_hf_processor ¶

get_hf_processor(
    typ: Union[
        type[_P], tuple[type[_P], ...]
    ] = ProcessorMixin,
    /,
    **kwargs: object,
) -> _P

Source code in vllm/inputs/registry.py

def get_hf_processor(
    self,
    typ: Union[type[_P], tuple[type[_P], ...]] = ProcessorMixin,
    /,
    **kwargs: object,
) -> _P:
    return super().get_hf_processor(
        typ,
        tokenizer=self.tokenizer,
        **kwargs,
    )

InputRegistry ¶

Note: This is only used in V0.

Source code in vllm/inputs/registry.py

class InputRegistry:
    """
    Note: This is only used in V0.
    """

    def dummy_data_for_profiling(
        self,
        model_config: ModelConfig,
        seq_len: int,
        mm_registry: MultiModalRegistry,
        is_encoder_data: bool = False,
    ) -> DummyData:
        """
        Create dummy data for profiling the memory usage of a model.

        The model is identified by ``model_config``.
        """
        # Avoid circular import
        from vllm.sequence import SequenceData

        if not model_config.is_multimodal_model:
            seq_data = SequenceData.from_prompt_token_counts((0, seq_len))
            return DummyData(seq_data=seq_data)

        # Encoder dummy data does not contain multi-modal data
        if is_encoder_data:
            enc_data = mm_registry.get_encoder_dummy_data(
                model_config, seq_len)
            seq_data = SequenceData.from_seqs(enc_data.prompt_token_ids)
            return DummyData(seq_data=seq_data)

        dec_data = mm_registry.get_decoder_dummy_data(model_config, seq_len)

        return DummyData(
            seq_data=SequenceData.from_seqs(dec_data.prompt_token_ids),
            multi_modal_data=dec_data.multi_modal_data.get_data(),
            multi_modal_placeholders=dec_data.multi_modal_placeholders,
        )

dummy_data_for_profiling ¶

dummy_data_for_profiling(
    model_config: ModelConfig,
    seq_len: int,
    mm_registry: MultiModalRegistry,
    is_encoder_data: bool = False,
) -> DummyData

Create dummy data for profiling the memory usage of a model.

The model is identified by model_config.

Source code in vllm/inputs/registry.py

def dummy_data_for_profiling(
    self,
    model_config: ModelConfig,
    seq_len: int,
    mm_registry: MultiModalRegistry,
    is_encoder_data: bool = False,
) -> DummyData:
    """
    Create dummy data for profiling the memory usage of a model.

    The model is identified by ``model_config``.
    """
    # Avoid circular import
    from vllm.sequence import SequenceData

    if not model_config.is_multimodal_model:
        seq_data = SequenceData.from_prompt_token_counts((0, seq_len))
        return DummyData(seq_data=seq_data)

    # Encoder dummy data does not contain multi-modal data
    if is_encoder_data:
        enc_data = mm_registry.get_encoder_dummy_data(
            model_config, seq_len)
        seq_data = SequenceData.from_seqs(enc_data.prompt_token_ids)
        return DummyData(seq_data=seq_data)

    dec_data = mm_registry.get_decoder_dummy_data(model_config, seq_len)

    return DummyData(
        seq_data=SequenceData.from_seqs(dec_data.prompt_token_ids),
        multi_modal_data=dec_data.multi_modal_data.get_data(),
        multi_modal_placeholders=dec_data.multi_modal_placeholders,
    )

TextPrompt ¶

Bases: TypedDict

Schema for a text prompt.

Source code in vllm/inputs/data.py

class TextPrompt(TypedDict):
    """Schema for a text prompt."""

    prompt: str
    """The input text to be tokenized before passing to the model."""

    multi_modal_data: NotRequired["MultiModalDataDict"]
    """
    Optional multi-modal data to pass to the model,
    if the model supports it.
    """

    mm_processor_kwargs: NotRequired[dict[str, Any]]
    """
    Optional multi-modal processor kwargs to be forwarded to the
    multimodal input mapper & processor. Note that if multiple modalities
    have registered mappers etc for the model being considered, we attempt
    to pass the mm_processor_kwargs to each of them.
    """

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

Optional cache salt to be used for prefix caching.

mm_processor_kwargs `instance-attribute` ¶

mm_processor_kwargs: NotRequired[dict[str, Any]]

Optional multi-modal processor kwargs to be forwarded to the multimodal input mapper & processor. Note that if multiple modalities have registered mappers etc for the model being considered, we attempt to pass the mm_processor_kwargs to each of them.

multi_modal_data `instance-attribute` ¶

multi_modal_data: NotRequired[MultiModalDataDict]

Optional multi-modal data to pass to the model, if the model supports it.

prompt `instance-attribute` ¶

prompt: str

The input text to be tokenized before passing to the model.

TokenInputs ¶

Bases: TypedDict

Represents token-based inputs.

Source code in vllm/inputs/data.py

class TokenInputs(TypedDict):
    """Represents token-based inputs."""

    type: Literal["token"]
    """The type of inputs."""

    prompt_token_ids: list[int]
    """The token IDs of the prompt."""

    token_type_ids: NotRequired[list[int]]
    """The token type IDs of the prompt."""

    prompt: NotRequired[str]
    """
    The original prompt text corresponding to the token IDs, if available.
    """

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

Optional cache salt to be used for prefix caching.

prompt `instance-attribute` ¶

prompt: NotRequired[str]

The original prompt text corresponding to the token IDs, if available.

prompt_token_ids `instance-attribute` ¶

prompt_token_ids: list[int]

The token IDs of the prompt.

token_type_ids `instance-attribute` ¶

token_type_ids: NotRequired[list[int]]

The token type IDs of the prompt.

type `instance-attribute` ¶

type: Literal['token']

The type of inputs.

TokensPrompt ¶

Bases: TypedDict

Schema for a tokenized prompt.

Source code in vllm/inputs/data.py

class TokensPrompt(TypedDict):
    """Schema for a tokenized prompt."""

    prompt_token_ids: list[int]
    """A list of token IDs to pass to the model."""

    token_type_ids: NotRequired[list[int]]
    """A list of token type IDs to pass to the cross encoder model."""

    multi_modal_data: NotRequired["MultiModalDataDict"]
    """
    Optional multi-modal data to pass to the model,
    if the model supports it.
    """

    mm_processor_kwargs: NotRequired[dict[str, Any]]
    """
    Optional multi-modal processor kwargs to be forwarded to the
    multimodal input mapper & processor. Note that if multiple modalities
    have registered mappers etc for the model being considered, we attempt
    to pass the mm_processor_kwargs to each of them.
    """

    cache_salt: NotRequired[str]
    """
    Optional cache salt to be used for prefix caching.
    """

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

Optional cache salt to be used for prefix caching.

mm_processor_kwargs `instance-attribute` ¶

mm_processor_kwargs: NotRequired[dict[str, Any]]

Optional multi-modal processor kwargs to be forwarded to the multimodal input mapper & processor. Note that if multiple modalities have registered mappers etc for the model being considered, we attempt to pass the mm_processor_kwargs to each of them.

multi_modal_data `instance-attribute` ¶

multi_modal_data: NotRequired[MultiModalDataDict]

Optional multi-modal data to pass to the model, if the model supports it.

prompt_token_ids `instance-attribute` ¶

prompt_token_ids: list[int]

A list of token IDs to pass to the model.

token_type_ids `instance-attribute` ¶

token_type_ids: NotRequired[list[int]]

A list of token type IDs to pass to the cross encoder model.

build_explicit_enc_dec_prompt ¶

build_explicit_enc_dec_prompt(
    encoder_prompt: _T1,
    decoder_prompt: Optional[_T2],
    mm_processor_kwargs: Optional[dict[str, Any]] = None,
) -> ExplicitEncoderDecoderPrompt[_T1, _T2]

Source code in vllm/inputs/data.py

def build_explicit_enc_dec_prompt(
    encoder_prompt: _T1,
    decoder_prompt: Optional[_T2],
    mm_processor_kwargs: Optional[dict[str, Any]] = None,
) -> ExplicitEncoderDecoderPrompt[_T1, _T2]:
    if mm_processor_kwargs is None:
        mm_processor_kwargs = {}
    return ExplicitEncoderDecoderPrompt(
        encoder_prompt=encoder_prompt,
        decoder_prompt=decoder_prompt,
        mm_processor_kwargs=mm_processor_kwargs,
    )

embeds_inputs ¶

embeds_inputs(
    prompt_embeds: Tensor, cache_salt: Optional[str] = None
) -> EmbedsInputs

Construct EmbedsInputs from optional values.

Source code in vllm/inputs/data.py

def embeds_inputs(
    prompt_embeds: torch.Tensor,
    cache_salt: Optional[str] = None,
) -> EmbedsInputs:
    """Construct [`EmbedsInputs`][vllm.inputs.data.EmbedsInputs] from optional
    values."""
    inputs = EmbedsInputs(type="embeds", prompt_embeds=prompt_embeds)

    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt

    return inputs

to_enc_dec_tuple_list ¶

to_enc_dec_tuple_list(
    enc_dec_prompts: Iterable[
        ExplicitEncoderDecoderPrompt[_T1, _T2]
    ],
) -> list[tuple[_T1, Optional[_T2]]]

Source code in vllm/inputs/data.py

def to_enc_dec_tuple_list(
    enc_dec_prompts: Iterable[ExplicitEncoderDecoderPrompt[_T1, _T2]],
) -> list[tuple[_T1, Optional[_T2]]]:
    return [(enc_dec_prompt["encoder_prompt"],
             enc_dec_prompt["decoder_prompt"])
            for enc_dec_prompt in enc_dec_prompts]

token_inputs ¶

token_inputs(
    prompt_token_ids: list[int],
    token_type_ids: Optional[list[int]] = None,
    prompt: Optional[str] = None,
    cache_salt: Optional[str] = None,
) -> TokenInputs

Construct TokenInputs from optional values.

Source code in vllm/inputs/data.py

def token_inputs(
    prompt_token_ids: list[int],
    token_type_ids: Optional[list[int]] = None,
    prompt: Optional[str] = None,
    cache_salt: Optional[str] = None,
) -> TokenInputs:
    """Construct [`TokenInputs`][vllm.inputs.data.TokenInputs] from optional
    values."""
    inputs = TokenInputs(type="token", prompt_token_ids=prompt_token_ids)

    if prompt is not None:
        inputs["prompt"] = prompt
    if token_type_ids is not None:
        inputs["token_type_ids"] = token_type_ids
    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt

    return inputs

zip_enc_dec_prompts ¶

zip_enc_dec_prompts(
    enc_prompts: Iterable[_T1],
    dec_prompts: Iterable[Optional[_T2]],
    mm_processor_kwargs: Optional[
        Union[Iterable[dict[str, Any]], dict[str, Any]]
    ] = None,
) -> list[ExplicitEncoderDecoderPrompt[_T1, _T2]]

Zip encoder and decoder prompts together into a list of ExplicitEncoderDecoderPrompt instances.

mm_processor_kwargs may also be provided; if a dict is passed, the same dictionary will be used for every encoder/decoder prompt. If an iterable is provided, it will be zipped with the encoder/decoder prompts.

Source code in vllm/inputs/data.py

def zip_enc_dec_prompts(
    enc_prompts: Iterable[_T1],
    dec_prompts: Iterable[Optional[_T2]],
    mm_processor_kwargs: Optional[Union[Iterable[dict[str, Any]],
                                        dict[str, Any]]] = None,
) -> list[ExplicitEncoderDecoderPrompt[_T1, _T2]]:
    """
    Zip encoder and decoder prompts together into a list of
    [`ExplicitEncoderDecoderPrompt`][vllm.inputs.data.ExplicitEncoderDecoderPrompt]
    instances.

    ``mm_processor_kwargs`` may also be provided; if a dict is passed, the same
    dictionary will be used for every encoder/decoder prompt. If an iterable is
    provided, it will be zipped with the encoder/decoder prompts.
    """
    if mm_processor_kwargs is None:
        mm_processor_kwargs = cast(dict[str, Any], {})
    if isinstance(mm_processor_kwargs, dict):
        return [
            build_explicit_enc_dec_prompt(
                encoder_prompt,
                decoder_prompt,
                cast(dict[str, Any], mm_processor_kwargs),
            ) for (encoder_prompt,
                   decoder_prompt) in zip(enc_prompts, dec_prompts)
        ]
    return [
        build_explicit_enc_dec_prompt(encoder_prompt, decoder_prompt,
                                      mm_proc_kwargs)
        for (encoder_prompt, decoder_prompt, mm_proc_kwargs
             ) in zip(enc_prompts, dec_prompts, mm_processor_kwargs)
    ]

vllm.inputs

DecoderOnlyInputs module-attribute ¶

INPUT_REGISTRY module-attribute ¶

ProcessorInputs module-attribute ¶

PromptType module-attribute ¶

SingletonInputs module-attribute ¶

SingletonPrompt module-attribute ¶

__all__ module-attribute ¶

DummyData ¶

multi_modal_data class-attribute instance-attribute ¶

multi_modal_placeholders class-attribute instance-attribute ¶

seq_data instance-attribute ¶

EmbedsInputs ¶

cache_salt instance-attribute ¶

prompt_embeds instance-attribute ¶

type instance-attribute ¶

EmbedsPrompt ¶

cache_salt instance-attribute ¶

prompt_embeds instance-attribute ¶

EncoderDecoderInputs ¶

decoder instance-attribute ¶

encoder instance-attribute ¶

ExplicitEncoderDecoderPrompt ¶

decoder_prompt instance-attribute ¶

encoder_prompt instance-attribute ¶

mm_processor_kwargs instance-attribute ¶

InputContext dataclass ¶

model_config instance-attribute ¶

__init__ ¶

get_hf_config ¶

get_hf_image_processor_config ¶

get_hf_processor ¶

get_mm_config ¶

init_processor ¶

InputProcessingContext dataclass ¶

tokenizer instance-attribute ¶

__init__ ¶

call_hf_processor ¶

get_hf_processor ¶

InputRegistry ¶

dummy_data_for_profiling ¶

TextPrompt ¶

cache_salt instance-attribute ¶

mm_processor_kwargs instance-attribute ¶

multi_modal_data instance-attribute ¶

prompt instance-attribute ¶

TokenInputs ¶

cache_salt instance-attribute ¶

prompt instance-attribute ¶

prompt_token_ids instance-attribute ¶

token_type_ids instance-attribute ¶

type instance-attribute ¶

TokensPrompt ¶

cache_salt instance-attribute ¶

mm_processor_kwargs instance-attribute ¶

multi_modal_data instance-attribute ¶

prompt_token_ids instance-attribute ¶

token_type_ids instance-attribute ¶

build_explicit_enc_dec_prompt ¶

embeds_inputs ¶

to_enc_dec_tuple_list ¶

token_inputs ¶

zip_enc_dec_prompts ¶

DecoderOnlyInputs `module-attribute` ¶

INPUT_REGISTRY `module-attribute` ¶

ProcessorInputs `module-attribute` ¶

PromptType `module-attribute` ¶

SingletonInputs `module-attribute` ¶

SingletonPrompt `module-attribute` ¶

all `module-attribute` ¶

multi_modal_data `class-attribute` `instance-attribute` ¶

multi_modal_placeholders `class-attribute` `instance-attribute` ¶

seq_data `instance-attribute` ¶

cache_salt `instance-attribute` ¶

prompt_embeds `instance-attribute` ¶

type `instance-attribute` ¶

cache_salt `instance-attribute` ¶

prompt_embeds `instance-attribute` ¶

decoder `instance-attribute` ¶

encoder `instance-attribute` ¶

decoder_prompt `instance-attribute` ¶

encoder_prompt `instance-attribute` ¶

mm_processor_kwargs `instance-attribute` ¶

InputContext `dataclass` ¶

model_config `instance-attribute` ¶

init ¶

InputProcessingContext `dataclass` ¶

tokenizer `instance-attribute` ¶

init ¶

cache_salt `instance-attribute` ¶

mm_processor_kwargs `instance-attribute` ¶

multi_modal_data `instance-attribute` ¶

prompt `instance-attribute` ¶

cache_salt `instance-attribute` ¶

prompt `instance-attribute` ¶

prompt_token_ids `instance-attribute` ¶

token_type_ids `instance-attribute` ¶

type `instance-attribute` ¶

cache_salt `instance-attribute` ¶

mm_processor_kwargs `instance-attribute` ¶

multi_modal_data `instance-attribute` ¶

prompt_token_ids `instance-attribute` ¶

token_type_ids `instance-attribute` ¶