KubeRay¶

KubeRay provides a Kubernetes-native way to run vLLM workloads on Ray clusters. A Ray cluster can be declared in YAML, and the operator then handles pod scheduling, networking configuration, restarts, and blue-green deployments — all while preserving the familiar Kubernetes experience.

Why KubeRay instead of manual scripts?¶

Feature	Manual scripts	KubeRay
Cluster bootstrap	Manually SSH into every node and run a script	One command to create or update the whole cluster: `kubectl apply -f cluster.yaml`
Autoscaling	Manual	Automatically patches CRDs for adjusting cluster size
Upgrades	Tear down & re-create manually	Blue/green deployment updates supported
Declarative config	Bash flags & environment variables	Git-ops-friendly YAML CRDs (RayCluster/RayService)

Using KubeRay reduces the operational burden and simplifies integration of Ray + vLLM with existing Kubernetes workflows (CI/CD, secrets, storage classes, etc.).

Learn more¶

"Serve a Large Language Model using Ray Serve LLM on Kubernetes" - An end-to-end example of how to serve a model using vLLM, KubeRay, and Ray Serve.
KubeRay documentation