vllm.model_executor.layers.fused_moe.routing_simulator
Token-to-Expert Routing Simulator
This module provides a framework for simulating and testing different token-to-expert routing strategies for Mixture of Experts (MoE) models. It supports routing logic customization and includes example implementations like uniform random routing.
DistributionBasedRouting ¶
Bases: RoutingStrategy
Distribution-based random routing strategy with configurable distributions.
This routing strategy randomly selects experts for each token based on different probability distributions. Currently supports uniform and normal distributions for testing different routing patterns.
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|
__init__ ¶
__init__(
distribution: str = "uniform", **distribution_params
)
Initialize distribution-based routing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
distribution | str | Type of distribution to use for sampling - "uniform": Uniform distribution (default) - "normal": Normal/Gaussian distribution | 'uniform' |
**distribution_params | Parameters specific to the chosen distribution For "uniform": No additional parameters needed For "normal": mean (default: 0.0), std (default: 1.0) | {} |
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
_generate_weights ¶
Generate weights based on the distribution.
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
_normalize_samples ¶
Normalize samples to [0, 1] range.
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
_sample_continuous_distribution ¶
Sample from continuous distributions.
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
_sample_expert_ids ¶
_sample_expert_ids(
num_tokens: int,
num_experts: int,
top_k: int,
device: device,
indices_type: dtype,
) -> Tensor
Sample expert IDs based on the specified distribution.
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
_validate_distribution_params ¶
Validate distribution type and parameters.
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
get_distribution_info ¶
get_distribution_info() -> dict
Get information about the current distribution configuration.
route_tokens ¶
route_tokens(
hidden_states: Tensor,
router_logits: Tensor,
top_k: int,
indices_type: Optional[dtype] = None,
) -> tuple[Tensor, Tensor]
Randomly select experts for each token using the specified distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hidden_states | Tensor | Input hidden states [num_tokens, hidden_size] | required |
router_logits | Tensor | Router logits [num_tokens, num_experts] | required |
top_k | int | Number of experts to select per token | required |
indices_type | Optional[dtype] | Data type for expert indices | None |
Returns:
Type | Description |
---|---|
Tensor | tuple of (topk_weights, topk_ids) where: |
Tensor |
|
tuple[Tensor, Tensor] |
|
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
RoutingSimulator ¶
Token-to-Expert Routing Simulator.
This class provides a framework for testing and comparing different routing strategies for MoE models. It can simulate routing behavior and collect statistics for analysis.
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
_routing_strategies class-attribute
instance-attribute
¶
_routing_strategies: dict[str, RoutingStrategy] = {
"uniform_random": DistributionBasedRouting(
distribution="uniform", mean=0.0, std=1.0
),
"normal_routing": DistributionBasedRouting(
distribution="normal", mean=0.0, std=1.0
),
}
get_available_strategies classmethod
¶
Get list of available routing strategy names.
Returns:
Type | Description |
---|---|
List of available strategy names |
register_strategy classmethod
¶
register_strategy(name: str, strategy: RoutingStrategy)
Register a custom routing strategy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the strategy | required |
strategy | RoutingStrategy | RoutingStrategy instance | required |
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
simulate_routing staticmethod
¶
simulate_routing(
hidden_states: Tensor,
router_logits: Tensor,
strategy_name: str,
top_k: int,
indices_type: Optional[dtype] = None,
) -> tuple[Tensor, Tensor]
Simulate token-to-expert routing using the specified strategy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hidden_states | Tensor | Input hidden states [num_tokens, hidden_size] | required |
router_logits | Tensor | Router logits [num_tokens, num_experts] | required |
strategy_name | str | Name of the routing strategy to use | required |
top_k | int | Number of experts to select per token | required |
indices_type | Optional[dtype] | Data type for expert indices | None |
Returns:
Type | Description |
---|---|
tuple[Tensor, Tensor] | tuple of (topk_weights, topk_ids) |
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
RoutingStrategy ¶
Bases: ABC
Base class for token-to-expert routing strategies.
Source code in vllm/model_executor/layers/fused_moe/routing_simulator.py
route_tokens abstractmethod
¶
route_tokens(
hidden_states: Tensor,
router_logits: Tensor,
top_k: int,
indices_type: Optional[dtype] = None,
) -> tuple[Tensor, Tensor]
Route tokens to experts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hidden_states | Tensor | Input hidden states [num_tokens, hidden_size] | required |
router_logits | Tensor | Router logits [num_tokens, num_experts] | required |
top_k | int | Number of experts to select per token | required |
indices_type | Optional[dtype] | Data type for expert indices | None |
Returns:
Type | Description |
---|---|
tuple[Tensor, Tensor] | tuple of (topk_weights, topk_ids) |