vllm.inputs
Modules:
Name | Description |
---|---|
data | |
parse | |
preprocess | |
registry | |
DecoderOnlyInputs module-attribute
¶
DecoderOnlyInputs = Union[
TokenInputs, EmbedsInputs, "MultiModalInputs"
]
The inputs in LLMEngine
before they are passed to the model executor. This specifies the data required for decoder-only models.
INPUT_REGISTRY module-attribute
¶
INPUT_REGISTRY = InputRegistry()
The global InputRegistry
which is used by LLMEngine
to dispatch data processing according to the target model.
ProcessorInputs module-attribute
¶
ProcessorInputs = Union[
DecoderOnlyInputs, EncoderDecoderInputs
]
The outputs from vllm.inputs.preprocess.InputPreprocessor
.
PromptType module-attribute
¶
PromptType = Union[
SingletonPrompt, ExplicitEncoderDecoderPrompt
]
Set of possible schemas for an LLM input, including both decoder-only and encoder/decoder input types:
- A text prompt (
str
orTextPrompt
) - A tokenized prompt (
TokensPrompt
) - An embeddings prompt (
EmbedsPrompt
) - A single data structure containing both an encoder and a decoder prompt (
ExplicitEncoderDecoderPrompt
)
SingletonInputs module-attribute
¶
SingletonInputs = Union[
TokenInputs, EmbedsInputs, "MultiModalInputs"
]
A processed SingletonPrompt
which can be passed to vllm.sequence.Sequence
.
SingletonPrompt module-attribute
¶
SingletonPrompt = Union[
str, TextPrompt, TokensPrompt, EmbedsPrompt
]
Set of possible schemas for a single prompt:
- A text prompt (
str
orTextPrompt
) - A tokenized prompt (
TokensPrompt
) - An embeddings prompt (
EmbedsPrompt
)
Note that "singleton" is as opposed to a data structure which encapsulates multiple prompts, i.e. of the sort which may be utilized for encoder/decoder models when the user desires to express both the encoder & decoder prompts explicitly, i.e. ExplicitEncoderDecoderPrompt
A prompt of type SingletonPrompt
may be employed as (1) input to a decoder-only model, (2) input to the encoder of an encoder/decoder model, in the scenario where the decoder-prompt is not specified explicitly, or (3) as a member of a larger data structure encapsulating more than one prompt, i.e. ExplicitEncoderDecoderPrompt
__all__ module-attribute
¶
__all__ = [
"TextPrompt",
"TokensPrompt",
"PromptType",
"SingletonPrompt",
"ExplicitEncoderDecoderPrompt",
"TokenInputs",
"EmbedsInputs",
"EmbedsPrompt",
"token_inputs",
"embeds_inputs",
"DecoderOnlyInputs",
"EncoderDecoderInputs",
"ProcessorInputs",
"SingletonInputs",
"build_explicit_enc_dec_prompt",
"to_enc_dec_tuple_list",
"zip_enc_dec_prompts",
"INPUT_REGISTRY",
"DummyData",
"InputContext",
"InputProcessingContext",
"InputRegistry",
]
DummyData ¶
Bases: NamedTuple
Dummy data used for profiling.
Note: This is only used in V0.
Source code in vllm/inputs/registry.py
multi_modal_data class-attribute
instance-attribute
¶
multi_modal_data: Optional[MultiModalDataDict] = None
multi_modal_placeholders class-attribute
instance-attribute
¶
multi_modal_placeholders: Optional[
MultiModalPlaceholderDict
] = None
EmbedsInputs ¶
EmbedsPrompt ¶
Bases: TypedDict
Schema for a prompt provided via token embeddings.
Source code in vllm/inputs/data.py
cache_salt instance-attribute
¶
cache_salt: NotRequired[str]
Optional cache salt to be used for prefix caching.
EncoderDecoderInputs ¶
Bases: TypedDict
The inputs in LLMEngine
before they are passed to the model executor.
This specifies the required data for encoder-decoder models.
Source code in vllm/inputs/data.py
decoder instance-attribute
¶
decoder: Union[TokenInputs, MultiModalInputs]
The inputs for the decoder portion.
encoder instance-attribute
¶
encoder: Union[TokenInputs, MultiModalInputs]
The inputs for the encoder portion.
ExplicitEncoderDecoderPrompt ¶
Bases: TypedDict
, Generic[_T1_co, _T2_co]
Represents an encoder/decoder model input prompt, comprising an explicit encoder prompt and a decoder prompt.
The encoder and decoder prompts, respectively, may be formatted according to any of the SingletonPrompt
schemas, and are not required to have the same schema.
Only the encoder prompt may have multi-modal data. mm_processor_kwargs should be at the top-level, and should not be set in the encoder/decoder prompts, since they are agnostic to the encoder/decoder.
Note that an ExplicitEncoderDecoderPrompt
may not be used as an input to a decoder-only model, and that the encoder_prompt
and decoder_prompt
fields of this data structure themselves must be SingletonPrompt
instances.
Source code in vllm/inputs/data.py
InputContext dataclass
¶
Contains information about the model which may be used to modify the inputs.
Source code in vllm/inputs/registry.py
get_hf_config ¶
Get the HuggingFace configuration (transformers.PretrainedConfig
) of the model, additionally checking its type.
Raises:
Type | Description |
---|---|
TypeError | If the configuration is not of the specified type. |
Source code in vllm/inputs/registry.py
get_hf_image_processor_config ¶
get_hf_processor ¶
get_hf_processor(
typ: Union[
type[_P], tuple[type[_P], ...]
] = ProcessorMixin,
/,
**kwargs: object,
) -> _P
Get the HuggingFace processor (transformers.ProcessorMixin
) of the model, additionally checking its type.
Raises:
Type | Description |
---|---|
TypeError | If the processor is not of the specified type. |
Source code in vllm/inputs/registry.py
get_mm_config ¶
Get the multimodal config of the model.
Raises:
Type | Description |
---|---|
RuntimeError | If the model is not a multimodal model. |
Source code in vllm/inputs/registry.py
init_processor ¶
Initialize a HuggingFace-like processor class, merging the keyword arguments with those in the model's configuration.
Source code in vllm/inputs/registry.py
InputProcessingContext dataclass
¶
Bases: InputContext
Source code in vllm/inputs/registry.py
call_hf_processor ¶
call_hf_processor(
hf_processor: ProcessorMixin,
data: Mapping[str, object],
kwargs: Mapping[str, object] = {},
) -> Union[BatchFeature, JSONTree]
Call hf_processor
on the prompt data
(text, image, audio...) with configurable options kwargs
.
Source code in vllm/inputs/registry.py
InputRegistry ¶
Note: This is only used in V0.
Source code in vllm/inputs/registry.py
dummy_data_for_profiling ¶
dummy_data_for_profiling(
model_config: ModelConfig,
seq_len: int,
mm_registry: MultiModalRegistry,
is_encoder_data: bool = False,
) -> DummyData
Create dummy data for profiling the memory usage of a model.
The model is identified by model_config
.
Source code in vllm/inputs/registry.py
TextPrompt ¶
Bases: TypedDict
Schema for a text prompt.
Source code in vllm/inputs/data.py
cache_salt instance-attribute
¶
cache_salt: NotRequired[str]
Optional cache salt to be used for prefix caching.
mm_processor_kwargs instance-attribute
¶
mm_processor_kwargs: NotRequired[dict[str, Any]]
Optional multi-modal processor kwargs to be forwarded to the multimodal input mapper & processor. Note that if multiple modalities have registered mappers etc for the model being considered, we attempt to pass the mm_processor_kwargs to each of them.
multi_modal_data instance-attribute
¶
multi_modal_data: NotRequired[MultiModalDataDict]
Optional multi-modal data to pass to the model, if the model supports it.
TokenInputs ¶
Bases: TypedDict
Represents token-based inputs.
Source code in vllm/inputs/data.py
cache_salt instance-attribute
¶
cache_salt: NotRequired[str]
Optional cache salt to be used for prefix caching.
prompt instance-attribute
¶
prompt: NotRequired[str]
The original prompt text corresponding to the token IDs, if available.
token_type_ids instance-attribute
¶
token_type_ids: NotRequired[list[int]]
The token type IDs of the prompt.
TokensPrompt ¶
Bases: TypedDict
Schema for a tokenized prompt.
Source code in vllm/inputs/data.py
cache_salt instance-attribute
¶
cache_salt: NotRequired[str]
Optional cache salt to be used for prefix caching.
mm_processor_kwargs instance-attribute
¶
mm_processor_kwargs: NotRequired[dict[str, Any]]
Optional multi-modal processor kwargs to be forwarded to the multimodal input mapper & processor. Note that if multiple modalities have registered mappers etc for the model being considered, we attempt to pass the mm_processor_kwargs to each of them.
multi_modal_data instance-attribute
¶
multi_modal_data: NotRequired[MultiModalDataDict]
Optional multi-modal data to pass to the model, if the model supports it.
prompt_token_ids instance-attribute
¶
A list of token IDs to pass to the model.
token_type_ids instance-attribute
¶
token_type_ids: NotRequired[list[int]]
A list of token type IDs to pass to the cross encoder model.
build_explicit_enc_dec_prompt ¶
build_explicit_enc_dec_prompt(
encoder_prompt: _T1,
decoder_prompt: Optional[_T2],
mm_processor_kwargs: Optional[dict[str, Any]] = None,
) -> ExplicitEncoderDecoderPrompt[_T1, _T2]
Source code in vllm/inputs/data.py
embeds_inputs ¶
embeds_inputs(
prompt_embeds: Tensor, cache_salt: Optional[str] = None
) -> EmbedsInputs
Construct EmbedsInputs
from optional values.
Source code in vllm/inputs/data.py
to_enc_dec_tuple_list ¶
to_enc_dec_tuple_list(
enc_dec_prompts: Iterable[
ExplicitEncoderDecoderPrompt[_T1, _T2]
],
) -> list[tuple[_T1, Optional[_T2]]]
Source code in vllm/inputs/data.py
token_inputs ¶
token_inputs(
prompt_token_ids: list[int],
token_type_ids: Optional[list[int]] = None,
prompt: Optional[str] = None,
cache_salt: Optional[str] = None,
) -> TokenInputs
Construct TokenInputs
from optional values.
Source code in vllm/inputs/data.py
zip_enc_dec_prompts ¶
zip_enc_dec_prompts(
enc_prompts: Iterable[_T1],
dec_prompts: Iterable[Optional[_T2]],
mm_processor_kwargs: Optional[
Union[Iterable[dict[str, Any]], dict[str, Any]]
] = None,
) -> list[ExplicitEncoderDecoderPrompt[_T1, _T2]]
Zip encoder and decoder prompts together into a list of ExplicitEncoderDecoderPrompt
instances.
mm_processor_kwargs
may also be provided; if a dict is passed, the same dictionary will be used for every encoder/decoder prompt. If an iterable is provided, it will be zipped with the encoder/decoder prompts.