Skip to content
vLLM
vllm.transformers_utils.configs.speculators
Initializing search
GitHub
Home
User Guide
Developer Guide
API Reference
CLI Reference
Community
vLLM
GitHub
Home
User Guide
User Guide
Getting Started
Getting Started
Quickstart
Installation
Installation
GPU
CPU
Google TPU
Intel Gaudi
AWS Neuron
Examples
Examples
Offline Inference
Offline Inference
Async LLM Streaming
Audio Language
Automatic Prefix Caching
Basic
Batch LLM Inference
Chat With Tools
Context Extension
Convert Model To Seq Cls
Data Parallel
Disaggregated Prefill V1
Disaggregated Prefill
Dolphin
Embed Jina Embeddings V3
Embed Matryoshka Fy
Encoder Decoder
Encoder Decoder Multimodal
LLM Engine Example
Load Sharded State
Logits Processor
LoRA With Quantization Inference
Metrics
Mistral-Small
MLPSpeculator
MultiLoRA Inference
Neuron
Neuron Eagle
Neuron INT8 Quantization
Neuron Multimodal
Neuron Speculation
Offline Inference with the OpenAI Batch file format
Prefix Caching
Prithvi Geospatial MAE
Profiling
vLLM TPU Profiling
Prompt Embed Inference
Qwen2.5-Omni Offline Inference Examples
Qwen3 Reranker
Qwen 1M
Reproducibility
RLHF
RLHF Colocate
RLHF Utils
Save Sharded State
Simple Profiling
Skip Loading Weights In Engine Init
Spec Decode
Structured Outputs
Torchrun Example
TPU
Vision Language
Vision Language Multi Image
Vision Language Pooling
Online Serving
Online Serving
API Client
Helm Charts
Cohere Rerank Client
Disaggregated Prefill
Disaggregated Serving
Gradio OpenAI Chatbot Webserver
Gradio Webserver
Jinaai Rerank Client
Kv Events Subscriber
Multi-Node-Serving
Multi Instance Data Parallel
OpenAI Chat Completion Client
OpenAI Chat Completion Client For Multimodal
OpenAI Chat Completion Client With Tools
OpenAI Chat Completion Client With Tools Required
OpenAI Chat Completion Client With Tools Xlam
OpenAI Chat Completion Client With Tools Xlam Streaming
OpenAI Chat Completion Tool Calls With Reasoning
OpenAI Chat Completion With Reasoning
OpenAI Chat Completion With Reasoning Streaming
OpenAI Chat Embedding Client For Multimodal
OpenAI Classification Client
OpenAI Completion Client
OpenAI Cross Encoder Score
OpenAI Cross Encoder Score For Multimodal
OpenAI Embedding Client
Long Text Embedding with Chunked Processing
OpenAI Embedding Matryoshka Fy
OpenAI Pooling Client
OpenAI Transcription Client
OpenAI Translation Client
Setup OpenTelemetry POC
Prometheus and Grafana
Prompt Embed Inference With OpenAI Client
Ray Serve Deepseek
Retrieval Augmented Generation With Langchain
Retrieval Augmented Generation With Llamaindex
Run Cluster
Sagemaker-Entrypoint
Streamlit OpenAI Chatbot Webserver
Structured Outputs
Utils
Others
Others
LMCache Examples
Logging Configuration
Tensorize vLLM Model
General
General
vLLM V1
Frequently Asked Questions
Production Metrics
Reproducibility
Security
Troubleshooting
Usage Stats Collection
Inference and Serving
Inference and Serving
Offline Inference
OpenAI-Compatible Server
Data Parallel Deployment
Troubleshooting distributed deployments
Expert Parallel Deployment
Parallelism and Scaling
Integrations
Integrations
LangChain
LlamaIndex
Deployment
Deployment
Using Docker
Using Kubernetes
Using Nginx
Frameworks
Frameworks
Anyscale
Anything LLM
AutoGen
BentoML
Cerebrium
Chatbox
Dify
dstack
Haystack
Helm
LiteLLM
Lobe Chat
LWS
Modal
Open WebUI
Retrieval-Augmented Generation
SkyPilot
Streamlit
NVIDIA Triton
Integrations
Integrations
KServe
KubeAI
KubeRay
Llama Stack
llmaz
Production stack
Training
Training
Reinforcement Learning from Human Feedback
Transformers Reinforcement Learning
Configuration
Configuration
Conserving Memory
Engine Arguments
Environment Variables
Model Resolution
Optimization and Tuning
Server Arguments
TPU Optimization Tips
Models
Models
Supported Models
Generative Models
Pooling Models
Extensions
Extensions
Loading Model weights with fastsafetensors
Loading models with Run:ai Model Streamer
Loading models with CoreWeave's Tensorizer
Hardware Supported Models
Hardware Supported Models
TPU
Features
Features
Compatibility Matrix
Automatic Prefix Caching
Disaggregated Prefilling (experimental)
LoRA Adapters
Multimodal Inputs
Prompt Embedding Inputs
Reasoning Outputs
Sleep Mode
Speculative Decoding
Structured Outputs
Tool Calling
Quantization
Quantization
AutoAWQ
AutoRound
BitBLAS
BitsAndBytes
FP8 W8A8
GGUF
GPTQModel
FP8 INC
INT4 W4A16
INT8 W8A8
NVIDIA TensorRT Model Optimizer
Quantized KV Cache
AMD Quark
Supported Hardware
TorchAO
Developer Guide
Developer Guide
General
General
Benchmark Suites
Deprecation Policy
Dockerfile
Incremental Compilation Workflow
Profiling vLLM
Vulnerability Management
Model Implementation
Model Implementation
Basic Model
Registering a Model
Unit Testing
Multi-Modal Support
CI
CI
CI Failures
Update PyTorch version on vLLM OSS CI/CD
Design Documents
Design Documents
Architecture Overview
Fused MoE Modular Kernel
Integration with Hugging Face
Hybrid KV Cache Manager
Metrics
Multi-Modal Data Processing
Python Multiprocessing
P2P NCCL Connector
Paged Attention
Plugin System
Automatic Prefix Caching
torch.compile integration
API Reference
API Reference
vllm.beam_search
vllm.collect_env
vllm.connections
vllm.env_override
vllm.envs
vllm.forward_context
vllm
vllm.logger
vllm.logits_process
vllm.outputs
vllm.pooling_params
vllm.sampling_params
vllm.scalar_type
vllm.scripts
vllm.sequence
vllm.tasks
vllm.test_utils
vllm.tracing
vllm.version
vllm.adapter_commons
vllm.adapter_commons
vllm.adapter_commons.layers
vllm.adapter_commons.models
vllm.adapter_commons.request
vllm.adapter_commons.utils
vllm.adapter_commons.worker_manager
vllm.assets
vllm.assets
vllm.assets.audio
vllm.assets.base
vllm.assets.image
vllm.assets.video
vllm.attention
vllm.attention
vllm.attention.layer
vllm.attention.selector
vllm.attention.backends
vllm.attention.backends
vllm.attention.backends.abstract
vllm.attention.backends.differential_flash_attn
vllm.attention.backends.dual_chunk_flash_attn
vllm.attention.backends.flash_attn
vllm.attention.backends.flashmla
vllm.attention.backends.placeholder_attn
vllm.attention.backends.rocm_aiter_mla
vllm.attention.backends.rocm_flash_attn
vllm.attention.backends.triton_mla
vllm.attention.backends.utils
vllm.attention.backends.xformers
vllm.attention.backends.mla
vllm.attention.backends.mla
vllm.attention.backends.mla.common
vllm.attention.layers
vllm.attention.layers
vllm.attention.layers.chunked_local_attention
vllm.attention.layers.encoder_only_attention
vllm.attention.ops
vllm.attention.ops
vllm.attention.ops.chunked_prefill_paged_decode
vllm.attention.ops.flashmla
vllm.attention.ops.merge_attn_states
vllm.attention.ops.nki_flash_attn
vllm.attention.ops.paged_attn
vllm.attention.ops.pallas_kv_cache_update
vllm.attention.ops.prefix_prefill
vllm.attention.ops.rocm_aiter_mla
vllm.attention.ops.rocm_aiter_paged_attn
vllm.attention.ops.triton_decode_attention
vllm.attention.ops.triton_flash_attention
vllm.attention.ops.triton_merge_attn_states
vllm.attention.ops.triton_unified_attention
vllm.attention.utils
vllm.attention.utils
vllm.attention.utils.fa_utils
vllm.attention.utils.kv_sharing_utils
vllm.benchmarks
vllm.benchmarks
vllm.benchmarks.datasets
vllm.benchmarks.latency
vllm.benchmarks.serve
vllm.benchmarks.throughput
vllm.benchmarks.lib
vllm.benchmarks.lib
vllm.benchmarks.lib.endpoint_request_func
vllm.benchmarks.lib.ready_checker
vllm.benchmarks.lib.utils
vllm.compilation
vllm.compilation
vllm.compilation.activation_quant_fusion
vllm.compilation.backends
vllm.compilation.base_static_graph
vllm.compilation.collective_fusion
vllm.compilation.compiler_interface
vllm.compilation.counter
vllm.compilation.cuda_graph
vllm.compilation.cuda_piecewise_backend
vllm.compilation.decorators
vllm.compilation.fix_functionalization
vllm.compilation.fusion
vllm.compilation.fusion_attn
vllm.compilation.fx_utils
vllm.compilation.inductor_pass
vllm.compilation.monitor
vllm.compilation.multi_output_match
vllm.compilation.noop_elimination
vllm.compilation.pass_manager
vllm.compilation.sequence_parallelism
vllm.compilation.torch25_custom_graph_pass
vllm.compilation.vllm_inductor_pass
vllm.compilation.wrapper
vllm.config
vllm.config
vllm.config.cache
vllm.config.compilation
vllm.config.parallel
vllm.config.scheduler
vllm.config.utils
vllm.core
vllm.core
vllm.core.block_manager
vllm.core.evictor
vllm.core.interfaces
vllm.core.placeholder_block_space_manager
vllm.core.scheduler
vllm.core.block
vllm.core.block
vllm.core.block.block_table
vllm.core.block.common
vllm.core.block.cpu_gpu_block_allocator
vllm.core.block.interfaces
vllm.core.block.naive_block
vllm.core.block.prefix_caching_block
vllm.core.block.utils
vllm.device_allocator
vllm.device_allocator
vllm.device_allocator.cumem
vllm.distributed
vllm.distributed
vllm.distributed.communication_op
vllm.distributed.kv_events
vllm.distributed.parallel_state
vllm.distributed.tpu_distributed_utils
vllm.distributed.utils
vllm.distributed.device_communicators
vllm.distributed.device_communicators
vllm.distributed.device_communicators.all2all
vllm.distributed.device_communicators.all_reduce_utils
vllm.distributed.device_communicators.base_device_communicator
vllm.distributed.device_communicators.cpu_communicator
vllm.distributed.device_communicators.cuda_communicator
vllm.distributed.device_communicators.cuda_wrapper
vllm.distributed.device_communicators.custom_all_reduce
vllm.distributed.device_communicators.neuron_communicator
vllm.distributed.device_communicators.pynccl
vllm.distributed.device_communicators.pynccl_wrapper
vllm.distributed.device_communicators.quick_all_reduce
vllm.distributed.device_communicators.ray_communicator
vllm.distributed.device_communicators.shm_broadcast
vllm.distributed.device_communicators.symm_mem
vllm.distributed.device_communicators.tpu_communicator
vllm.distributed.device_communicators.xpu_communicator
vllm.distributed.eplb
vllm.distributed.eplb
vllm.distributed.eplb.eplb_state
vllm.distributed.eplb.rebalance_algo
vllm.distributed.eplb.rebalance_execute
vllm.distributed.kv_transfer
vllm.distributed.kv_transfer
vllm.distributed.kv_transfer.kv_transfer_state
vllm.distributed.kv_transfer.kv_connector
vllm.distributed.kv_transfer.kv_connector
vllm.distributed.kv_transfer.kv_connector.base
vllm.distributed.kv_transfer.kv_connector.factory
vllm.distributed.kv_transfer.kv_connector.utils
vllm.distributed.kv_transfer.kv_connector.v1
vllm.distributed.kv_transfer.kv_connector.v1
vllm.distributed.kv_transfer.kv_connector.v1.base
vllm.distributed.kv_transfer.kv_connector.v1.lmcache_connector
vllm.distributed.kv_transfer.kv_connector.v1.multi_connector
vllm.distributed.kv_transfer.kv_connector.v1.nixl_connector
vllm.distributed.kv_transfer.kv_connector.v1.shared_storage_connector
vllm.distributed.kv_transfer.kv_connector.v1.p2p
vllm.distributed.kv_transfer.kv_connector.v1.p2p
vllm.distributed.kv_transfer.kv_connector.v1.p2p.p2p_nccl_connector
vllm.distributed.kv_transfer.kv_connector.v1.p2p.p2p_nccl_engine
vllm.distributed.kv_transfer.kv_connector.v1.p2p.tensor_memory_pool
vllm.distributed.kv_transfer.kv_lookup_buffer
vllm.distributed.kv_transfer.kv_lookup_buffer
vllm.distributed.kv_transfer.kv_lookup_buffer.base
vllm.distributed.kv_transfer.kv_lookup_buffer.mooncake_store
vllm.distributed.kv_transfer.kv_lookup_buffer.simple_buffer
vllm.distributed.kv_transfer.kv_pipe
vllm.distributed.kv_transfer.kv_pipe
vllm.distributed.kv_transfer.kv_pipe.base
vllm.distributed.kv_transfer.kv_pipe.mooncake_pipe
vllm.distributed.kv_transfer.kv_pipe.pynccl_pipe
vllm.engine
vllm.engine
vllm.engine.arg_utils
vllm.engine.async_llm_engine
vllm.engine.async_timeout
vllm.engine.llm_engine
vllm.engine.metrics
vllm.engine.metrics_types
vllm.engine.protocol
vllm.engine.multiprocessing
vllm.engine.multiprocessing
vllm.engine.multiprocessing.client
vllm.engine.multiprocessing.engine
vllm.engine.output_processor
vllm.engine.output_processor
vllm.engine.output_processor.interfaces
vllm.engine.output_processor.single_step
vllm.engine.output_processor.stop_checker
vllm.engine.output_processor.util
vllm.entrypoints
vllm.entrypoints
vllm.entrypoints.api_server
vllm.entrypoints.chat_utils
vllm.entrypoints.constants
vllm.entrypoints.context
vllm.entrypoints.harmony_utils
vllm.entrypoints.launcher
vllm.entrypoints.llm
vllm.entrypoints.logger
vllm.entrypoints.score_utils
vllm.entrypoints.ssl
vllm.entrypoints.tool
vllm.entrypoints.tool_server
vllm.entrypoints.utils
vllm.entrypoints.cli
vllm.entrypoints.cli
vllm.entrypoints.cli.collect_env
vllm.entrypoints.cli.main
vllm.entrypoints.cli.openai
vllm.entrypoints.cli.run_batch
vllm.entrypoints.cli.serve
vllm.entrypoints.cli.types
vllm.entrypoints.cli.benchmark
vllm.entrypoints.cli.benchmark
vllm.entrypoints.cli.benchmark.base
vllm.entrypoints.cli.benchmark.latency
vllm.entrypoints.cli.benchmark.main
vllm.entrypoints.cli.benchmark.serve
vllm.entrypoints.cli.benchmark.throughput
vllm.entrypoints.openai
vllm.entrypoints.openai
vllm.entrypoints.openai.api_server
vllm.entrypoints.openai.cli_args
vllm.entrypoints.openai.logits_processors
vllm.entrypoints.openai.protocol
vllm.entrypoints.openai.run_batch
vllm.entrypoints.openai.serving_chat
vllm.entrypoints.openai.serving_classification
vllm.entrypoints.openai.serving_completion
vllm.entrypoints.openai.serving_embedding
vllm.entrypoints.openai.serving_engine
vllm.entrypoints.openai.serving_models
vllm.entrypoints.openai.serving_pooling
vllm.entrypoints.openai.serving_responses
vllm.entrypoints.openai.serving_score
vllm.entrypoints.openai.serving_tokenization
vllm.entrypoints.openai.serving_transcription
vllm.entrypoints.openai.speech_to_text
vllm.entrypoints.openai.tool_parsers
vllm.entrypoints.openai.tool_parsers
vllm.entrypoints.openai.tool_parsers.abstract_tool_parser
vllm.entrypoints.openai.tool_parsers.deepseekv3_tool_parser
vllm.entrypoints.openai.tool_parsers.deepseekv31_tool_parser
vllm.entrypoints.openai.tool_parsers.glm4_moe_tool_parser
vllm.entrypoints.openai.tool_parsers.granite_20b_fc_tool_parser
vllm.entrypoints.openai.tool_parsers.granite_tool_parser
vllm.entrypoints.openai.tool_parsers.hermes_tool_parser
vllm.entrypoints.openai.tool_parsers.hunyuan_a13b_tool_parser
vllm.entrypoints.openai.tool_parsers.internlm2_tool_parser
vllm.entrypoints.openai.tool_parsers.jamba_tool_parser
vllm.entrypoints.openai.tool_parsers.kimi_k2_tool_parser
vllm.entrypoints.openai.tool_parsers.llama4_pythonic_tool_parser
vllm.entrypoints.openai.tool_parsers.llama_tool_parser
vllm.entrypoints.openai.tool_parsers.minimax_tool_parser
vllm.entrypoints.openai.tool_parsers.mistral_tool_parser
vllm.entrypoints.openai.tool_parsers.phi4mini_tool_parser
vllm.entrypoints.openai.tool_parsers.pythonic_tool_parser
vllm.entrypoints.openai.tool_parsers.qwen3coder_tool_parser
vllm.entrypoints.openai.tool_parsers.seed_oss_tool_parser
vllm.entrypoints.openai.tool_parsers.step3_tool_parser
vllm.entrypoints.openai.tool_parsers.utils
vllm.entrypoints.openai.tool_parsers.xlam_tool_parser
vllm.executor
vllm.executor
vllm.executor.executor_base
vllm.executor.mp_distributed_executor
vllm.executor.msgspec_utils
vllm.executor.multiproc_worker_utils
vllm.executor.ray_distributed_executor
vllm.executor.ray_utils
vllm.executor.uniproc_executor
vllm.inputs
vllm.inputs
vllm.inputs.data
vllm.inputs.parse
vllm.inputs.preprocess
vllm.inputs.registry
vllm.logging_utils
vllm.logging_utils
vllm.logging_utils.dump_input
vllm.logging_utils.formatter
vllm.lora
vllm.lora
vllm.lora.fully_sharded_layers
vllm.lora.layers
vllm.lora.lora
vllm.lora.models
vllm.lora.peft_helper
vllm.lora.request
vllm.lora.resolver
vllm.lora.utils
vllm.lora.worker_manager
vllm.lora.ops
vllm.lora.ops
vllm.lora.ops.ipex_ops
vllm.lora.ops.ipex_ops
vllm.lora.ops.ipex_ops.lora_ops
vllm.lora.ops.torch_ops
vllm.lora.ops.torch_ops
vllm.lora.ops.torch_ops.lora_ops
vllm.lora.ops.triton_ops
vllm.lora.ops.triton_ops
vllm.lora.ops.triton_ops.kernel_utils
vllm.lora.ops.triton_ops.lora_expand_op
vllm.lora.ops.triton_ops.lora_kernel_metadata
vllm.lora.ops.triton_ops.lora_shrink_op
vllm.lora.ops.triton_ops.utils
vllm.lora.ops.xla_ops
vllm.lora.ops.xla_ops
vllm.lora.ops.xla_ops.lora_ops
vllm.lora.punica_wrapper
vllm.lora.punica_wrapper
vllm.lora.punica_wrapper.punica_base
vllm.lora.punica_wrapper.punica_cpu
vllm.lora.punica_wrapper.punica_gpu
vllm.lora.punica_wrapper.punica_selector
vllm.lora.punica_wrapper.punica_tpu
vllm.lora.punica_wrapper.punica_xpu
vllm.lora.punica_wrapper.utils
vllm.model_executor
vllm.model_executor
vllm.model_executor.custom_op
vllm.model_executor.parameter
vllm.model_executor.pooling_metadata
vllm.model_executor.sampling_metadata
vllm.model_executor.utils
vllm.model_executor.layers
vllm.model_executor.layers
vllm.model_executor.layers.activation
vllm.model_executor.layers.attention_layer_base
vllm.model_executor.layers.layernorm
vllm.model_executor.layers.lightning_attn
vllm.model_executor.layers.linear
vllm.model_executor.layers.logits_processor
vllm.model_executor.layers.pooler
vllm.model_executor.layers.resampler
vllm.model_executor.layers.sampler
vllm.model_executor.layers.utils
vllm.model_executor.layers.vocab_parallel_embedding
vllm.model_executor.layers.fused_moe
vllm.model_executor.layers.fused_moe
vllm.model_executor.layers.fused_moe.batched_deep_gemm_moe
vllm.model_executor.layers.fused_moe.batched_triton_or_deep_gemm_moe
vllm.model_executor.layers.fused_moe.config
vllm.model_executor.layers.fused_moe.cpu_fused_moe
vllm.model_executor.layers.fused_moe.cutlass_moe
vllm.model_executor.layers.fused_moe.deep_gemm_moe
vllm.model_executor.layers.fused_moe.deep_gemm_utils
vllm.model_executor.layers.fused_moe.deepep_ht_prepare_finalize
vllm.model_executor.layers.fused_moe.deepep_ll_prepare_finalize
vllm.model_executor.layers.fused_moe.flashinfer_cutlass_moe
vllm.model_executor.layers.fused_moe.flashinfer_cutlass_prepare_finalize
vllm.model_executor.layers.fused_moe.fused_batched_moe
vllm.model_executor.layers.fused_moe.fused_marlin_moe
vllm.model_executor.layers.fused_moe.fused_moe
vllm.model_executor.layers.fused_moe.gpt_oss_triton_kernels_moe
vllm.model_executor.layers.fused_moe.layer
vllm.model_executor.layers.fused_moe.modular_kernel
vllm.model_executor.layers.fused_moe.moe_align_block_size
vllm.model_executor.layers.fused_moe.moe_pallas
vllm.model_executor.layers.fused_moe.moe_permute_unpermute
vllm.model_executor.layers.fused_moe.moe_torch_iterative
vllm.model_executor.layers.fused_moe.pplx_prepare_finalize
vllm.model_executor.layers.fused_moe.prepare_finalize
vllm.model_executor.layers.fused_moe.rocm_aiter_fused_moe
vllm.model_executor.layers.fused_moe.routing_simulator
vllm.model_executor.layers.fused_moe.topk_weight_and_reduce
vllm.model_executor.layers.fused_moe.triton_deep_gemm_moe
vllm.model_executor.layers.fused_moe.utils
vllm.model_executor.layers.mamba
vllm.model_executor.layers.mamba
vllm.model_executor.layers.mamba.abstract
vllm.model_executor.layers.mamba.mamba2_metadata
vllm.model_executor.layers.mamba.mamba_mixer
vllm.model_executor.layers.mamba.mamba_mixer2
vllm.model_executor.layers.mamba.mamba_utils
vllm.model_executor.layers.mamba.short_conv
vllm.model_executor.layers.mamba.ops
vllm.model_executor.layers.mamba.ops
vllm.model_executor.layers.mamba.ops.causal_conv1d
vllm.model_executor.layers.mamba.ops.layernorm_gated
vllm.model_executor.layers.mamba.ops.mamba_ssm
vllm.model_executor.layers.mamba.ops.ssd_bmm
vllm.model_executor.layers.mamba.ops.ssd_chunk_scan
vllm.model_executor.layers.mamba.ops.ssd_chunk_state
vllm.model_executor.layers.mamba.ops.ssd_combined
vllm.model_executor.layers.mamba.ops.ssd_state_passing
vllm.model_executor.layers.quantization
vllm.model_executor.layers.quantization
vllm.model_executor.layers.quantization.auto_round
vllm.model_executor.layers.quantization.awq
vllm.model_executor.layers.quantization.awq_marlin
vllm.model_executor.layers.quantization.awq_triton
vllm.model_executor.layers.quantization.base_config
vllm.model_executor.layers.quantization.bitblas
vllm.model_executor.layers.quantization.bitsandbytes
vllm.model_executor.layers.quantization.deepgemm
vllm.model_executor.layers.quantization.deepspeedfp
vllm.model_executor.layers.quantization.experts_int8
vllm.model_executor.layers.quantization.fbgemm_fp8
vllm.model_executor.layers.quantization.fp8
vllm.model_executor.layers.quantization.gguf
vllm.model_executor.layers.quantization.gptq
vllm.model_executor.layers.quantization.gptq_bitblas
vllm.model_executor.layers.quantization.gptq_marlin
vllm.model_executor.layers.quantization.gptq_marlin_24
vllm.model_executor.layers.quantization.hqq_marlin
vllm.model_executor.layers.quantization.inc
vllm.model_executor.layers.quantization.input_quant_fp8
vllm.model_executor.layers.quantization.ipex_quant
vllm.model_executor.layers.quantization.kv_cache
vllm.model_executor.layers.quantization.modelopt
vllm.model_executor.layers.quantization.moe_wna16
vllm.model_executor.layers.quantization.mxfp4
vllm.model_executor.layers.quantization.neuron_quant
vllm.model_executor.layers.quantization.petit
vllm.model_executor.layers.quantization.ptpc_fp8
vllm.model_executor.layers.quantization.rtn
vllm.model_executor.layers.quantization.schema
vllm.model_executor.layers.quantization.torchao
vllm.model_executor.layers.quantization.tpu_int8
vllm.model_executor.layers.quantization.compressed_tensors
vllm.model_executor.layers.quantization.compressed_tensors
vllm.model_executor.layers.quantization.compressed_tensors.compressed_tensors
vllm.model_executor.layers.quantization.compressed_tensors.compressed_tensors_moe
vllm.model_executor.layers.quantization.compressed_tensors.triton_scaled_mm
vllm.model_executor.layers.quantization.compressed_tensors.utils
vllm.model_executor.layers.quantization.compressed_tensors.schemes
vllm.model_executor.layers.quantization.compressed_tensors.schemes
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_24
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_scheme
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_w4a4_nvfp4
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_w4a8_fp8
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_w4a8_int
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_w4a16_24
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_w4a16_nvfp4
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_w8a8_fp8
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_w8a8_int8
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_w8a16_fp8
vllm.model_executor.layers.quantization.compressed_tensors.schemes.compressed_tensors_wNa16
vllm.model_executor.layers.quantization.kernels
vllm.model_executor.layers.quantization.kernels
vllm.model_executor.layers.quantization.kernels.mixed_precision
vllm.model_executor.layers.quantization.kernels.mixed_precision
vllm.model_executor.layers.quantization.kernels.mixed_precision.allspark
vllm.model_executor.layers.quantization.kernels.mixed_precision.bitblas
vllm.model_executor.layers.quantization.kernels.mixed_precision.conch
vllm.model_executor.layers.quantization.kernels.mixed_precision.cutlass
vllm.model_executor.layers.quantization.kernels.mixed_precision.dynamic_4bit
vllm.model_executor.layers.quantization.kernels.mixed_precision.exllama
vllm.model_executor.layers.quantization.kernels.mixed_precision.MPLinearKernel
vllm.model_executor.layers.quantization.kernels.mixed_precision.machete
vllm.model_executor.layers.quantization.kernels.mixed_precision.marlin
vllm.model_executor.layers.quantization.kernels.scaled_mm
vllm.model_executor.layers.quantization.kernels.scaled_mm
vllm.model_executor.layers.quantization.kernels.scaled_mm.aiter
vllm.model_executor.layers.quantization.kernels.scaled_mm.cpu
vllm.model_executor.layers.quantization.kernels.scaled_mm.cutlass
vllm.model_executor.layers.quantization.kernels.scaled_mm.ScaledMMLinearKernel
vllm.model_executor.layers.quantization.kernels.scaled_mm.triton
vllm.model_executor.layers.quantization.kernels.scaled_mm.xla
vllm.model_executor.layers.quantization.quark
vllm.model_executor.layers.quantization.quark
vllm.model_executor.layers.quantization.quark.quark
vllm.model_executor.layers.quantization.quark.quark_moe
vllm.model_executor.layers.quantization.quark.utils
vllm.model_executor.layers.quantization.quark.schemes
vllm.model_executor.layers.quantization.quark.schemes
vllm.model_executor.layers.quantization.quark.schemes.quark_scheme
vllm.model_executor.layers.quantization.quark.schemes.quark_w4a4_mxfp4
vllm.model_executor.layers.quantization.quark.schemes.quark_w8a8_fp8
vllm.model_executor.layers.quantization.quark.schemes.quark_w8a8_int8
vllm.model_executor.layers.quantization.utils
vllm.model_executor.layers.quantization.utils
vllm.model_executor.layers.quantization.utils.allspark_utils
vllm.model_executor.layers.quantization.utils.bitblas_utils
vllm.model_executor.layers.quantization.utils.flashinfer_fp4_moe
vllm.model_executor.layers.quantization.utils.flashinfer_utils
vllm.model_executor.layers.quantization.utils.fp8_utils
vllm.model_executor.layers.quantization.utils.gptq_utils
vllm.model_executor.layers.quantization.utils.int8_utils
vllm.model_executor.layers.quantization.utils.layer_utils
vllm.model_executor.layers.quantization.utils.machete_utils
vllm.model_executor.layers.quantization.utils.marlin_utils
vllm.model_executor.layers.quantization.utils.marlin_utils_fp4
vllm.model_executor.layers.quantization.utils.marlin_utils_fp8
vllm.model_executor.layers.quantization.utils.marlin_utils_test
vllm.model_executor.layers.quantization.utils.marlin_utils_test_24
vllm.model_executor.layers.quantization.utils.mxfp4_utils
vllm.model_executor.layers.quantization.utils.nvfp4_emulation_utils
vllm.model_executor.layers.quantization.utils.nvfp4_moe_support
vllm.model_executor.layers.quantization.utils.petit_utils
vllm.model_executor.layers.quantization.utils.quant_utils
vllm.model_executor.layers.quantization.utils.w8a8_utils
vllm.model_executor.layers.rotary_embedding
vllm.model_executor.layers.rotary_embedding
vllm.model_executor.layers.rotary_embedding.base
vllm.model_executor.layers.rotary_embedding.common
vllm.model_executor.layers.rotary_embedding.deepseek_scaling_rope
vllm.model_executor.layers.rotary_embedding.dual_chunk_rope
vllm.model_executor.layers.rotary_embedding.dynamic_ntk_alpha_rope
vllm.model_executor.layers.rotary_embedding.dynamic_ntk_scaling_rope
vllm.model_executor.layers.rotary_embedding.linear_scaling_rope
vllm.model_executor.layers.rotary_embedding.llama3_rope
vllm.model_executor.layers.rotary_embedding.llama4_vision_rope
vllm.model_executor.layers.rotary_embedding.mrope
vllm.model_executor.layers.rotary_embedding.ntk_scaling_rope
vllm.model_executor.layers.rotary_embedding.phi3_long_rope_scaled_rope
vllm.model_executor.layers.rotary_embedding.yarn_scaling_rope
vllm.model_executor.model_loader
vllm.model_executor.model_loader
vllm.model_executor.model_loader.base_loader
vllm.model_executor.model_loader.bitsandbytes_loader
vllm.model_executor.model_loader.default_loader
vllm.model_executor.model_loader.dummy_loader
vllm.model_executor.model_loader.gguf_loader
vllm.model_executor.model_loader.neuron
vllm.model_executor.model_loader.neuronx_distributed
vllm.model_executor.model_loader.runai_streamer_loader
vllm.model_executor.model_loader.sharded_state_loader
vllm.model_executor.model_loader.tensorizer
vllm.model_executor.model_loader.tensorizer_loader
vllm.model_executor.model_loader.tpu
vllm.model_executor.model_loader.utils
vllm.model_executor.model_loader.weight_utils
vllm.model_executor.models
vllm.model_executor.models
vllm.model_executor.models.adapters
vllm.model_executor.models.aimv2
vllm.model_executor.models.arcee
vllm.model_executor.models.arctic
vllm.model_executor.models.aria
vllm.model_executor.models.aya_vision
vllm.model_executor.models.baichuan
vllm.model_executor.models.bailing_moe
vllm.model_executor.models.bamba
vllm.model_executor.models.bart
vllm.model_executor.models.bert
vllm.model_executor.models.bert_with_rope
vllm.model_executor.models.blip
vllm.model_executor.models.blip2
vllm.model_executor.models.bloom
vllm.model_executor.models.chameleon
vllm.model_executor.models.chatglm
vllm.model_executor.models.clip
vllm.model_executor.models.cohere2_vision
vllm.model_executor.models.commandr
vllm.model_executor.models.config
vllm.model_executor.models.constant_size_cache
vllm.model_executor.models.dbrx
vllm.model_executor.models.deepseek
vllm.model_executor.models.deepseek_eagle
vllm.model_executor.models.deepseek_mtp
vllm.model_executor.models.deepseek_v2
vllm.model_executor.models.deepseek_vl2
vllm.model_executor.models.donut
vllm.model_executor.models.dots1
vllm.model_executor.models.ernie45
vllm.model_executor.models.ernie45_moe
vllm.model_executor.models.ernie_mtp
vllm.model_executor.models.exaone
vllm.model_executor.models.exaone4
vllm.model_executor.models.fairseq2_llama
vllm.model_executor.models.falcon
vllm.model_executor.models.falcon_h1
vllm.model_executor.models.florence2
vllm.model_executor.models.fuyu
vllm.model_executor.models.gemma
vllm.model_executor.models.gemma2
vllm.model_executor.models.gemma3
vllm.model_executor.models.gemma3_mm
vllm.model_executor.models.gemma3n
vllm.model_executor.models.gemma3n_mm
vllm.model_executor.models.glm
vllm.model_executor.models.glm4
vllm.model_executor.models.glm4_1v
vllm.model_executor.models.glm4_moe
vllm.model_executor.models.glm4_moe_mtp
vllm.model_executor.models.glm4v
vllm.model_executor.models.gpt2
vllm.model_executor.models.gpt_bigcode
vllm.model_executor.models.gpt_j
vllm.model_executor.models.gpt_neox
vllm.model_executor.models.gpt_oss
vllm.model_executor.models.granite
vllm.model_executor.models.granite_speech
vllm.model_executor.models.granitemoe
vllm.model_executor.models.granitemoehybrid
vllm.model_executor.models.granitemoeshared
vllm.model_executor.models.gritlm
vllm.model_executor.models.grok1
vllm.model_executor.models.h2ovl
vllm.model_executor.models.hunyuan_v1
vllm.model_executor.models.hyperclovax_vision
vllm.model_executor.models.idefics2_vision_model
vllm.model_executor.models.idefics3
vllm.model_executor.models.interfaces
vllm.model_executor.models.interfaces_base
vllm.model_executor.models.intern_vit
vllm.model_executor.models.internlm2
vllm.model_executor.models.internlm2_ve
vllm.model_executor.models.interns1
vllm.model_executor.models.interns1_vit
vllm.model_executor.models.internvl
vllm.model_executor.models.jais
vllm.model_executor.models.jamba
vllm.model_executor.models.jina_vl
vllm.model_executor.models.keye
vllm.model_executor.models.kimi_vl
vllm.model_executor.models.lfm2
vllm.model_executor.models.llama
vllm.model_executor.models.llama4
vllm.model_executor.models.llama4_eagle
vllm.model_executor.models.llama_eagle
vllm.model_executor.models.llama_eagle3
vllm.model_executor.models.llava
vllm.model_executor.models.llava_next
vllm.model_executor.models.llava_next_video
vllm.model_executor.models.llava_onevision
vllm.model_executor.models.mamba
vllm.model_executor.models.mamba2
vllm.model_executor.models.mamba_cache
vllm.model_executor.models.medusa
vllm.model_executor.models.mimo
vllm.model_executor.models.mimo_mtp
vllm.model_executor.models.minicpm
vllm.model_executor.models.minicpm3
vllm.model_executor.models.minicpm_eagle
vllm.model_executor.models.minicpmo
vllm.model_executor.models.minicpmv
vllm.model_executor.models.minimax_cache
vllm.model_executor.models.minimax_text_01
vllm.model_executor.models.minimax_vl_01
vllm.model_executor.models.mistral3
vllm.model_executor.models.mixtral
vllm.model_executor.models.mixtral_quant
vllm.model_executor.models.mllama
vllm.model_executor.models.mllama4
vllm.model_executor.models.mlp_speculator
vllm.model_executor.models.modernbert
vllm.model_executor.models.module_mapping
vllm.model_executor.models.molmo
vllm.model_executor.models.moonvit
vllm.model_executor.models.mpt
vllm.model_executor.models.nemotron
vllm.model_executor.models.nemotron_h
vllm.model_executor.models.nemotron_nas
vllm.model_executor.models.nemotron_vl
vllm.model_executor.models.nvlm_d
vllm.model_executor.models.olmo
vllm.model_executor.models.olmo2
vllm.model_executor.models.olmoe
vllm.model_executor.models.opt
vllm.model_executor.models.orion
vllm.model_executor.models.ovis
vllm.model_executor.models.ovis2_5
vllm.model_executor.models.paligemma
vllm.model_executor.models.persimmon
vllm.model_executor.models.phi
vllm.model_executor.models.phi3
vllm.model_executor.models.phi3v
vllm.model_executor.models.phi4_multimodal
vllm.model_executor.models.phi4flash
vllm.model_executor.models.phi4mm
vllm.model_executor.models.phi4mm_audio
vllm.model_executor.models.phi4mm_utils
vllm.model_executor.models.phimoe
vllm.model_executor.models.pixtral
vllm.model_executor.models.plamo2
vllm.model_executor.models.prithvi_geospatial_mae
vllm.model_executor.models.qwen
vllm.model_executor.models.qwen2
vllm.model_executor.models.qwen2_5_omni_thinker
vllm.model_executor.models.qwen2_5_vl
vllm.model_executor.models.qwen2_audio
vllm.model_executor.models.qwen2_moe
vllm.model_executor.models.qwen2_rm
vllm.model_executor.models.qwen2_vl
vllm.model_executor.models.qwen3
vllm.model_executor.models.qwen3_moe
vllm.model_executor.models.qwen_vl
vllm.model_executor.models.registry
vllm.model_executor.models.roberta
vllm.model_executor.models.rvl
vllm.model_executor.models.seed_oss
vllm.model_executor.models.siglip
vllm.model_executor.models.siglip2navit
vllm.model_executor.models.skyworkr1v
vllm.model_executor.models.smolvlm
vllm.model_executor.models.solar
vllm.model_executor.models.stablelm
vllm.model_executor.models.starcoder2
vllm.model_executor.models.step3_text
vllm.model_executor.models.step3_vl
vllm.model_executor.models.swin
vllm.model_executor.models.tarsier
vllm.model_executor.models.telechat2
vllm.model_executor.models.teleflm
vllm.model_executor.models.transformers
vllm.model_executor.models.ultravox
vllm.model_executor.models.utils
vllm.model_executor.models.vision
vllm.model_executor.models.voxtral
vllm.model_executor.models.whisper
vllm.model_executor.models.zamba2
vllm.model_executor.warmup
vllm.model_executor.warmup
vllm.model_executor.warmup.deep_gemm_warmup
vllm.model_executor.warmup.kernel_warmup
vllm.multimodal
vllm.multimodal
vllm.multimodal.audio
vllm.multimodal.base
vllm.multimodal.cache
vllm.multimodal.hasher
vllm.multimodal.image
vllm.multimodal.inputs
vllm.multimodal.parse
vllm.multimodal.processing
vllm.multimodal.profiling
vllm.multimodal.registry
vllm.multimodal.utils
vllm.multimodal.video
vllm.platforms
vllm.platforms
vllm.platforms.cpu
vllm.platforms.cuda
vllm.platforms.interface
vllm.platforms.neuron
vllm.platforms.rocm
vllm.platforms.tpu
vllm.platforms.xpu
vllm.plugins
vllm.plugins
vllm.plugins.lora_resolvers
vllm.plugins.lora_resolvers
vllm.plugins.lora_resolvers.filesystem_resolver
vllm.profiler
vllm.profiler
vllm.profiler.layerwise_profile
vllm.profiler.utils
vllm.ray
vllm.ray
vllm.ray.lazy_utils
vllm.ray.ray_env
vllm.reasoning
vllm.reasoning
vllm.reasoning.abs_reasoning_parsers
vllm.reasoning.deepseek_r1_reasoning_parser
vllm.reasoning.glm4_moe_reasoning_parser
vllm.reasoning.gptoss_reasoning_parser
vllm.reasoning.granite_reasoning_parser
vllm.reasoning.hunyuan_a13b_reasoning_parser
vllm.reasoning.mistral_reasoning_parser
vllm.reasoning.qwen3_reasoning_parser
vllm.reasoning.step3_reasoning_parser
vllm.transformers_utils
vllm.transformers_utils
vllm.transformers_utils.config
vllm.transformers_utils.detokenizer
vllm.transformers_utils.detokenizer_utils
vllm.transformers_utils.dynamic_module
vllm.transformers_utils.processor
vllm.transformers_utils.s3_utils
vllm.transformers_utils.tokenizer
vllm.transformers_utils.tokenizer_base
vllm.transformers_utils.tokenizer_group
vllm.transformers_utils.utils
vllm.transformers_utils.chat_templates
vllm.transformers_utils.chat_templates
vllm.transformers_utils.chat_templates.registry
vllm.transformers_utils.configs
vllm.transformers_utils.configs
vllm.transformers_utils.configs.arctic
vllm.transformers_utils.configs.chatglm
vllm.transformers_utils.configs.deepseek_vl2
vllm.transformers_utils.configs.eagle
vllm.transformers_utils.configs.falcon
vllm.transformers_utils.configs.jais
vllm.transformers_utils.configs.kimi_vl
vllm.transformers_utils.configs.medusa
vllm.transformers_utils.configs.mistral
vllm.transformers_utils.configs.mlp_speculator
vllm.transformers_utils.configs.moonvit
vllm.transformers_utils.configs.nemotron
vllm.transformers_utils.configs.nemotron_h
vllm.transformers_utils.configs.nemotron_vl
vllm.transformers_utils.configs.ovis
vllm.transformers_utils.configs.step3_vl
vllm.transformers_utils.configs.ultravox
vllm.transformers_utils.configs.speculators
vllm.transformers_utils.configs.speculators
vllm.transformers_utils.configs.speculators.algos
vllm.transformers_utils.configs.speculators.base
vllm.transformers_utils.processors
vllm.transformers_utils.processors
vllm.transformers_utils.processors.deepseek_vl2
vllm.transformers_utils.processors.ovis
vllm.transformers_utils.processors.ovis2_5
vllm.transformers_utils.tokenizers
vllm.transformers_utils.tokenizers
vllm.transformers_utils.tokenizers.mistral
vllm.triton_utils
vllm.triton_utils
vllm.triton_utils.importing
vllm.usage
vllm.usage
vllm.usage.usage_lib
vllm.utils
vllm.utils
vllm.utils.deep_gemm
vllm.utils.flashinfer
vllm.utils.jsontree
vllm.utils.tensor_schema
vllm.v1
vllm.v1
vllm.v1.cudagraph_dispatcher
vllm.v1.kv_cache_interface
vllm.v1.outputs
vllm.v1.request
vllm.v1.serial_utils
vllm.v1.utils
vllm.v1.attention
vllm.v1.attention
vllm.v1.attention.backends
vllm.v1.attention.backends
vllm.v1.attention.backends.cpu_attn
vllm.v1.attention.backends.flash_attn
vllm.v1.attention.backends.flashinfer
vllm.v1.attention.backends.flex_attention
vllm.v1.attention.backends.linear_attn
vllm.v1.attention.backends.mamba1_attn
vllm.v1.attention.backends.mamba2_attn
vllm.v1.attention.backends.mamba_attn
vllm.v1.attention.backends.pallas
vllm.v1.attention.backends.rocm_aiter_fa
vllm.v1.attention.backends.short_conv_attn
vllm.v1.attention.backends.tree_attn
vllm.v1.attention.backends.triton_attn
vllm.v1.attention.backends.utils
vllm.v1.attention.backends.xformers
vllm.v1.attention.backends.mla
vllm.v1.attention.backends.mla
vllm.v1.attention.backends.mla.common
vllm.v1.attention.backends.mla.cutlass_mla
vllm.v1.attention.backends.mla.flashmla
vllm.v1.attention.backends.mla.rocm_aiter_mla
vllm.v1.attention.backends.mla.triton_mla
vllm.v1.core
vllm.v1.core
vllm.v1.core.block_pool
vllm.v1.core.encoder_cache_manager
vllm.v1.core.kv_cache_coordinator
vllm.v1.core.kv_cache_manager
vllm.v1.core.kv_cache_utils
vllm.v1.core.single_type_kv_cache_manager
vllm.v1.core.sched
vllm.v1.core.sched
vllm.v1.core.sched.async_scheduler
vllm.v1.core.sched.interface
vllm.v1.core.sched.output
vllm.v1.core.sched.request_queue
vllm.v1.core.sched.scheduler
vllm.v1.core.sched.utils
vllm.v1.engine
vllm.v1.engine
vllm.v1.engine.async_llm
vllm.v1.engine.coordinator
vllm.v1.engine.core
vllm.v1.engine.core_client
vllm.v1.engine.detokenizer
vllm.v1.engine.exceptions
vllm.v1.engine.llm_engine
vllm.v1.engine.logprobs
vllm.v1.engine.mm_input_cache
vllm.v1.engine.output_processor
vllm.v1.engine.parallel_sampling
vllm.v1.engine.processor
vllm.v1.engine.utils
vllm.v1.executor
vllm.v1.executor
vllm.v1.executor.abstract
vllm.v1.executor.multiproc_executor
vllm.v1.executor.ray_distributed_executor
vllm.v1.metrics
vllm.v1.metrics
vllm.v1.metrics.loggers
vllm.v1.metrics.prometheus
vllm.v1.metrics.ray_wrappers
vllm.v1.metrics.reader
vllm.v1.metrics.stats
vllm.v1.pool
vllm.v1.pool
vllm.v1.pool.metadata
vllm.v1.sample
vllm.v1.sample
vllm.v1.sample.metadata
vllm.v1.sample.rejection_sampler
vllm.v1.sample.sampler
vllm.v1.sample.logits_processor
vllm.v1.sample.logits_processor
vllm.v1.sample.logits_processor.builtin
vllm.v1.sample.logits_processor.interface
vllm.v1.sample.logits_processor.state
vllm.v1.sample.ops
vllm.v1.sample.ops
vllm.v1.sample.ops.bad_words
vllm.v1.sample.ops.logprobs
vllm.v1.sample.ops.penalties
vllm.v1.sample.ops.topk_topp_sampler
vllm.v1.sample.tpu
vllm.v1.sample.tpu
vllm.v1.sample.tpu.metadata
vllm.v1.sample.tpu.sampler
vllm.v1.spec_decode
vllm.v1.spec_decode
vllm.v1.spec_decode.eagle
vllm.v1.spec_decode.medusa
vllm.v1.spec_decode.metadata
vllm.v1.spec_decode.metrics
vllm.v1.spec_decode.ngram_proposer
vllm.v1.spec_decode.utils
vllm.v1.structured_output
vllm.v1.structured_output
vllm.v1.structured_output.backend_guidance
vllm.v1.structured_output.backend_lm_format_enforcer
vllm.v1.structured_output.backend_outlines
vllm.v1.structured_output.backend_types
vllm.v1.structured_output.backend_xgrammar
vllm.v1.structured_output.request
vllm.v1.structured_output.utils
vllm.v1.worker
vllm.v1.worker
vllm.v1.worker.block_table
vllm.v1.worker.cpu_model_runner
vllm.v1.worker.cpu_worker
vllm.v1.worker.gpu_input_batch
vllm.v1.worker.gpu_model_runner
vllm.v1.worker.gpu_worker
vllm.v1.worker.kv_connector_model_runner_mixin
vllm.v1.worker.lora_model_runner_mixin
vllm.v1.worker.tpu_input_batch
vllm.v1.worker.tpu_model_runner
vllm.v1.worker.tpu_worker
vllm.v1.worker.utils
vllm.v1.worker.worker_base
vllm.v1.worker.xpu_model_runner
vllm.v1.worker.xpu_worker
vllm.worker
vllm.worker
vllm.worker.cache_engine
vllm.worker.enc_dec_model_runner
vllm.worker.model_runner
vllm.worker.model_runner_base
vllm.worker.neuron_model_runner
vllm.worker.neuron_worker
vllm.worker.neuronx_distributed_model_runner
vllm.worker.pooling_model_runner
vllm.worker.utils
vllm.worker.worker
vllm.worker.worker_base
CLI Reference
CLI Reference
vllm serve
vllm chat
vllm complete
vllm run-batch
vllm bench
vllm bench
vllm bench latency
vllm bench serve
vllm bench throughput
Community
Community
Contact Us
Meetups
Sponsors
Blog
Forum
Slack
Table of contents
speculators
vllm.transformers_utils.configs.speculators
Modules:
Name
Description
algos
base
Back to top