vllm.transformers_utils.configs.mlp_speculator
MLPSpeculatorConfig ¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/mlp_speculator.py
__init__ ¶
__init__(
vocab_size: int = 32000,
emb_dim: int = 4096,
inner_dim: int = 0,
n_predict: int = 3,
top_k_tokens_per_head: Optional[list[int]] = None,
n_candidates: int = 5,
tie_weights: bool = False,
scale_input: bool = False,
**kwargs,
)
Initialize an MLPSpeculatorConfig
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_size | int | int the model vocab size | 32000 |
emb_dim | int | int the model embedding dimension | 4096 |
inner_dim | int | int the inner dimension of the model. If 0, will be the emb_dim. | 0 |
n_predict | int | int the number of lookaheads for the speculator | 3 |
top_k_tokens_per_head | Optional[list[int]] | list[int] Number of tokens to consider from each head when forming the candidate tree. For each candidate branch in the tree, head n produces topk[n] additional sub-branches. NOTE: This parameter is currently unused. | None |
n_candidates | int | int number of child candidates to create per sequence | 5 |
tie_weights | bool | bool If true, use a single set of weights for every model head/stage after the first. The initial projection from the base model may have a different size, so that stays separate. | False |
scale_input | bool | bool if True, will scale the initial hidden states from the base model. | False |