vllm.v1.spec_decode.ngram_proposer
NgramProposer ¶
Source code in vllm/v1/spec_decode/ngram_proposer.py
__init__ ¶
__init__(vllm_config: VllmConfig)
Source code in vllm/v1/spec_decode/ngram_proposer.py
load_model ¶
propose ¶
Proposes the next sequence of tokens based on n-gram pattern matching in the context. The function finds matches of the last n tokens in the previous context, and returns k tokens that followed that match.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context_token_ids | ndarray | Numpy array of token IDs representing the context sequence. | required |
Returns:
Name | Type | Description |
---|---|---|
Optional[ndarray] | np.ndarray: The sequence of tokens that followed the matched n-gram in the context. | |
None | Optional[ndarray] | If no matching n-gram pattern is found. |
Example
If context_token_ids = [1,2,3,4,2,3], min_n = 2, max_n = 3, and k = 4: - The last 3 (= max_n) tokens [4,2,3] cannot find a match. - The last 2 tokens [2,3] will be matched against the previous 4 tokens [1,2,3,4]. - Finding a match of [2,3] would return the tokens that followed that pattern. Here we will return [4,2,3] because we only have three tokens after the match.
Source code in vllm/v1/spec_decode/ngram_proposer.py
_find_longest_matched_ngram_and_propose_tokens ¶
_find_longest_matched_ngram_and_propose_tokens(
origin_tokens: ndarray,
min_ngram: int,
max_ngram: int,
max_model_len: int,
k: int,
) -> Optional[ndarray]
Find the longest n-gram which matches the suffix of the given tokens whose length is within [min_ngram, max_ngram] (inclusive).
If found, we will extract k right after the matched ngram.
Source code in vllm/v1/spec_decode/ngram_proposer.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|