Skip to main content
Ctrl+K
👋 Join us for the SkyPilot AI Infra Meetup in San Francisco on March 25! Register here
You are viewing the latest developer preview docs. Click here to view docs for the latest stable release.

Site Navigation

  • Docs
  • Case Studies
  • Blog
  • Slack
  • Twitter
  • GitHub

Site Navigation

  • Docs
  • Case Studies
  • Blog
  • Slack
  • Twitter
  • GitHub

Getting Started

  • Overview
  • Installation
  • Quickstart
  • Agent Skills
  • Examples
    • Quickstart: PyTorch
    • Training
      • Axolotl
      • DeepSpeed
      • Distributed PyTorch
      • Distributed TensorFlow
      • Fairseq2
      • Finetuning GPT-OSS
      • Finetuning Llama 4
      • Finetuning Llama 3
      • Finetuning Llama 2
      • nanochat
      • NeMo
      • NeMo RL
      • OpenRLHF
      • PyTorch Monarch
      • Ray
      • TorchTitan
      • Training on TPUs
      • Unsloth
      • Verl (RLHF)
      • SkyRL
      • Vertex AI
    • Serving
      • vLLM
      • SGLang
      • Nvidia Dynamo
      • Ollama
      • Hugging Face TGI
      • LoRAX
      • Cog
    • Models
      • OpenAI gpt-oss
      • DeepSeek-R1
      • DeepSeek-R1 Distilled
      • DeepSeek-Janus
      • Gemma 3
      • Llama 4
      • Llama 3.2
      • Llama 3.1
      • Llama 3
      • Llama 2
      • CodeLlama
      • Pixtral
      • Mixtral
      • Mistral 7B
      • Qwen 3
      • Kimi K2
      • Kimi K2 Thinking
      • Yi
      • Gemma
      • DBRX
      • GPT-2 via llm.c
      • Vicuna
    • AI Applications
      • DeepSeek-R1 for RAG
      • DeepSeek OCR with Pools
      • Large-Scale Batch Inference
      • Batch Inference with vLLM
      • Image Vector Database
      • RedisVL Vector Search
      • SAM3 Video Segmentation
      • Streamlit Web Apps
      • Tabby: Coding Assistant
      • LocalGPT: Chat with PDF
      • Parallel Autoresearch
      • Stable Diffusion
    • AI Performance
      • AWS EFA
      • GCP/GKE GPUDirect
      • Coreweave with InfiniBand
      • Nebius with InfiniBand
      • Together AI with InfiniBand
    • Orchestrators
      • Airflow
      • Cron
      • Github Actions
      • Prefect
      • Temporal
    • Other Frameworks
      • Cross-cloud data transfer
      • DVC
      • Jupyter
      • marimo
      • MLFlow
      • MPI
      • Spyder IDE
  • Concept: Sky Computing

Clusters

  • Start a Development Cluster
  • Cluster Jobs
  • Provisioning Compute
  • Autostop and Autodown

Jobs

  • Managed Jobs
  • Checkpointing and Recovery
  • Multi-Node Jobs
  • Many Parallel Jobs
  • Model Training Guide
  • Using a Pool of Workers
  • Job Groups

Model Serving

  • Getting Started
  • Serving User Guides
    • Autoscaling
    • Updating a Service
    • Authorization
    • Using Spot Instances for Serving
    • HTTPS Encryption
    • High Availability Controller

Infra Choices

  • Using Kubernetes
    • Getting Started
    • Kubernetes Cluster Setup
      • Deployment Guides
      • Exposing Services
    • Priority and Preemption
    • Multiple Kubernetes Clusters
    • Configuring Pricing
    • SkyPilot vs. Vanilla Kubernetes
    • Examples
      • Kueue
      • Dynamic Workload Scheduler
      • Kueue with GKE DWS
      • Multi-region Kubernetes
    • Kubernetes Troubleshooting
  • Using Slurm
    • Getting Started
  • Using Existing Machines
  • Using Reservations
  • Using Cloud VMs
    • Requesting Quota Increase
  • GPUs and Accelerators
    • Using Google TPUs
    • Using AMD GPUs

Data

  • Cloud Buckets
  • Volumes
  • Syncing Code, Git, and Files

User Guides

  • SkyPilot Recipes
  • Migrating from Slurm
  • External Links
  • Asynchronous Execution
  • Environment Variables and Secrets
  • Docker Containers
  • Opening Ports
  • Usage Collection
  • Frequently Asked Questions

Administrator Guides

  • API Server Deployment
    • Deploying API Server
      • API server metrics monitoring
      • GPU metrics monitoring
      • Advanced: Cross-Cluster State Persistence
      • Example: Deploy on GKE, GCP, and Nebius with Okta
      • Example: Deploy SkyPilot API Server in Docker
      • Example: Deploy on GKE with Cloud SQL
    • Upgrading API Server
    • Performance Best Practices
    • Troubleshooting
    • Helm Chart Reference
    • Advanced: High Availability Controller
  • Authentication and RBAC
  • Workspaces: Isolating Teams
  • Cloud Accounts and Permissions
    • AWS
      • Using IAM Roles for S3 Access on EKS
    • GCP
    • Nebius
    • vSphere
    • Kubernetes
  • Admin Policies
  • External Logging Storage
  • Airgapped Environments

References

  • SkyPilot YAML
  • CLI
  • Python SDK
  • Advanced Configuration
    • Configuration Sources
  • SkyPilot Internals
  • Developer Guides
    • Contributing to SkyPilot
    • Guide: Adding a New Cloud

Serving#

  • vLLM
  • SGLang
  • Nvidia Dynamo
  • Ollama
  • Hugging Face TGI
  • LoRAX
  • Cog

previous

SkyRL: Modular Full-Stack RL Training for LLMs

next

vLLM: Easy, Fast, and Cheap LLM Inference

Edit on GitHub

© Copyright 2025, SkyPilot Team.