Comparing SkyPilot with other systems#
SkyPilot is a framework for running AI and batch workloads on any infrastructure. While SkyPilot offers unique capabilities, certain functionalities like job scheduling overlap with existing systems (e.g., Kubernetes, Slurm). That said, SkyPilot can be used in conjunction with them to provide additional benefits.
This page provides a comparison of SkyPilot with other systems, focusing on the unique benefits provided by SkyPilot. We welcome feedback and contributions to this page.
SkyPilot vs Vanilla Kubernetes#
Kubernetes is a powerful system for managing containerized applications. Using SkyPilot to access your Kubernetes cluster boosts developer productivity and allows you to scale your infra beyond a single Kubernetes cluster.
Faster developer velocity#
SkyPilot provides faster iteration for interactive development. For example, a common workflow for AI engineers is to iteratively develop and train models by tweaking code and hyperparameters and observing the training runs.
With Kubernetes, a single iteration is a multi-step process involving building a Docker image, pushing it to a registry, updating the Kubernetes YAML and then deploying it.
With SkyPilot, a single command (
sky launch
) takes care of everything. Behind the scenes, SkyPilot provisions pods, installs all required dependencies, executes the job, returns logs, and provides SSH and VSCode access to debug.
Simpler YAMLs#
Consider serving Gemma with vLLM on Kubernetes:
With vanilla Kubernetes, you need over 65 lines of Kubernetes YAML to launch a Gemma model served with vLLM.
With SkyPilot, an easy-to-understand 19-line YAML launches a pod serving Gemma with vLLM.
Here is a side-by-side comparison of the YAMLs for serving Gemma with vLLM on SkyPilot vs Kubernetes:
SkyPilot (19 lines)
1envs:
2 MODEL_NAME: google/gemma-2b-it
3 HF_TOKEN: myhftoken
4
5resources:
6 image_id: docker:vllm/vllm-openai:latest
7 accelerators: L4:1
8 ports: 8000
9
10setup: |
11 conda deactivate
12 python3 -c "import huggingface_hub; huggingface_hub.login('${HF_TOKEN}')"
13
14run: |
15 conda deactivate
16 echo 'Starting vllm openai api server...'
17 python -m vllm.entrypoints.openai.api_server \
18 --model $MODEL_NAME --tokenizer hf-internal-testing/llama-tokenizer \
19 --host 0.0.0.0
Kubernetes (65 lines)
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: vllm-gemma-deployment
5spec:
6 replicas: 1
7 selector:
8 matchLabels:
9 app: gemma-server
10 template:
11 metadata:
12 labels:
13 app: gemma-server
14 ai.gke.io/model: gemma-1.1-2b-it
15 ai.gke.io/inference-server: vllm
16 examples.ai.gke.io/source: user-guide
17 spec:
18 containers:
19 - name: inference-server
20 image: us-docker.pkg.dev/vertex-ai/ vertex-vision-model-garden-dockers/pytorch-vllm-serve:20240527_0916_RC00
21 resources:
22 requests:
23 cpu: "2"
24 memory: "10Gi"
25 ephemeral-storage: "10Gi"
26 nvidia.com/gpu: 1
27 limits:
28 cpu: "2"
29 memory: "10Gi"
30 ephemeral-storage: "10Gi"
31 nvidia.com/gpu: 1
32 command: ["python3", "-m", "vllm.entrypoints.api_server"]
33 args:
34 - --model=$(MODEL_ID)
35 - --tensor-parallel-size=1
36 env:
37 - name: MODEL_ID
38 value: google/gemma-1.1-2b-it
39 - name: HUGGING_FACE_HUB_TOKEN
40 valueFrom:
41 secretKeyRef:
42 name: hf-secret
43 key: hf_api_token
44 volumeMounts:
45 - mountPath: /dev/shm
46 name: dshm
47 volumes:
48 - name: dshm
49 emptyDir:
50 medium: Memory
51 nodeSelector:
52 cloud.google.com/gke-accelerator: nvidia-l4
53---
54apiVersion: v1
55kind: Service
56metadata:
57 name: llm-service
58spec:
59 selector:
60 app: gemma-server
61 type: ClusterIP
62 ports:
63 - protocol: TCP
64 port: 8000
65 targetPort: 8000
Scale beyond a single region/cluster#
A Kubernetes cluster is typically constrained to a single region in a single cloud. This is because etcd, the control store for Kubernetes state, can timeout and fail when it faces highers latencies across regions [1] [2] [3].
Being restricted to a single region/cloud with Vanilla Kubernetes has two drawbacks:
1. GPU availability is reduced because you cannot utilize available capacity elsewhere.
2. Costs increase as you are unable to take advantage of cheaper resources in other regions.
SkyPilot is designed to scale across clouds and regions: it allows you to run your tasks on your Kubernetes cluster, and burst to more regions and clouds if needed. In doing so, SkyPilot ensures that your tasks are always running in the most cost-effective region, while maintaining high availability.