vllm.model_executor.model_loader.tensorizer_loader
BLACKLISTED_TENSORIZER_ARGS module-attribute
¶
TensorizerLoader ¶
Bases: BaseModelLoader
Model loader using CoreWeave's tensorizer library.
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
__init__ ¶
__init__(load_config: LoadConfig)
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
_get_weights_iterator ¶
_load_model_serialized_cpu ¶
_load_model_serialized_cpu(
vllm_config: VllmConfig,
) -> Module
Load a serialized model with tensorizer to the CPU.
This is only necessary when the model isn't vLLM-tensorized (see examples/others/tensorize_vllm_model.py) This should still be faster than default HuggingFace loading, but will be slower than loading a vLLM-tensorized model.
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
_patch_tensorizer_config ¶
_patch_tensorizer_config(
model_config: ModelConfig,
) -> TensorizerConfig
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
_verify_config ¶
_verify_config(
model_config: ModelConfig,
parallel_config: ParallelConfig,
)
download_model ¶
download_model(model_config: ModelConfig) -> None
load_model ¶
load_model(
vllm_config: VllmConfig, model_config: ModelConfig
) -> Module
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
load_weights ¶
load_weights(
model: Module, model_config: ModelConfig
) -> None
Load serialized model weights with tensorizer.
Expects a vLLM-tensorized model. See the examples/others/tensorize_vllm_model.py example script for serializing vLLM models.
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
save_model staticmethod
¶
save_model(
model: Module,
tensorizer_config: Union[TensorizerConfig, dict],
model_config: ModelConfig,
) -> None