vllm.transformers_utils.configs.ultravox
UltravoxConfig ¶
Bases: PretrainedConfig
This is the configuration class to store the configuration of a [UltravoxForConditionalGeneration
]. It is used to instantiate an Ultravox model according to the specified arguments, defining the model architecture.
Configuration objects inherit from [PretrainedConfig
] and can be used to control the model outputs. Read the documentation from [PretrainedConfig
] for more information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_config | `Union[AutoConfig, dict]`, *optional* | Custom audio config or dict | None |
text_config | `Union[AutoConfig, dict]`, *optional* | The config object of the text backbone. Can be any of | None |
ignore_index | `int`, *optional*, defaults to -100 | The ignore index for the loss function. | -100 |
audio_token_index | `int`, *optional*, defaults to 32000 | The audio token index to encode the audio prompt. | 32000 |
stack_factor | `int`, *optional*, defaults to 8 | Audio downsampling factor for the multimodal projector. | 8 |
norm_init | `float`, *optional*, defaults to 0.4 | The initialization value for the layer normalization. | 0.4 |
projector_act | `str`, *optional*, defaults to `"swiglu"` | The activation function used by the multimodal projector. | 'swiglu' |
text_model_lora_config | `LoraConfigSimplified`, *optional* | The LoRA configuration for finetuning the text model. | None |
audio_model_lora_config | `LoraConfigSimplified`, *optional* | The LoRA configuration for finetuning the audio model. | None |
projector_ln_mid | `bool`, *optional*, defaults to `False` | Whether to apply layer normalization at the middle of the projector or at the end. Versions v0.4.1 and below use | False |
Source code in vllm/transformers_utils/configs/ultravox.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
audio_model_lora_config instance-attribute
¶
__init__ ¶
__init__(
audio_config: Optional[dict[str, Any]] = None,
text_config: Optional[dict[str, Any]] = None,
audio_model_id: Optional[str] = None,
text_model_id: Optional[str] = None,
ignore_index: int = -100,
audio_token_index: int = 32000,
hidden_size: int = 4096,
stack_factor: int = 8,
norm_init: float = 0.4,
projector_act: str = "swiglu",
text_model_lora_config: Optional[dict[str, Any]] = None,
audio_model_lora_config: Optional[
dict[str, Any]
] = None,
projector_ln_mid: bool = False,
**kwargs,
)