Welcome to SkyPilot!#
Run LLMs and AI on Any Cloud
SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution.
SkyPilot abstracts away cloud infra burdens:
Launch jobs & clusters on any cloud
Easy scale-out: queue and run many jobs, automatically managed
Easy access to object stores (S3, GCS, Azure, R2, IBM)
SkyPilot maximizes GPU availability for your jobs:
Provision in all zones/regions/clouds you have access to (the Sky), with automatic failover
SkyPilot cuts your cloud costs:
Managed Spot: 3-6x cost savings using spot VMs, with auto-recovery from preemptions
Optimizer: 2x cost savings by auto-picking the cheapest VM/zone/region/cloud
Autostop: hands-free cleanup of idle clusters
SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes.
Current supported providers (AWS, GCP, Azure, OCI, Lambda Cloud, RunPod, Fluidstack, Cudo, IBM, Samsung, Cloudflare, VMware vSphere, any Kubernetes cluster):
More Information#
Tutorials: SkyPilot Tutorials
Runnable examples:
LLMs on SkyPilot
Mixtral 8x7B; Mistral 7B (from official Mistral team)
vLLM: Serving LLM 24x Faster On the Cloud (from official vLLM team)
SGLang: Fast and Expressive LLM Serving On the Cloud (from official SGLang team)
Vicuna chatbots: Training & Serving (from official Vicuna team)
Add yours here & see more in llm/!
Framework examples: PyTorch DDP, DeepSpeed, JAX/Flax on TPU, Stable Diffusion, Detectron2, Distributed TensorFlow, NeMo, programmatic grid search, Docker, Cog, Unsloth, Ollama, llm.c and many more.
Follow updates:
Read the research:
SkyPilot paper and talk (NSDI 2023)
Sky Computing vision paper (HotOS 2021)