Deploy SkyPilot on existing machines#

This guide will help you deploy SkyPilot on your existing machines — whether they are on-premises or reserved instances on a cloud provider.

Given a list of IP addresses and SSH credentials, SkyPilot will install necessary dependencies on the remote machines and configure itself to run jobs and services on the cluster.

Deploying SkyPilot on existing machines

Given a list of IP addresses and SSH keys, sky local up will install necessary dependencies on the remote machines and configure SkyPilot to run jobs and services on the cluster.#

Deploying SkyPilot on existing machines

Given a list of IP addresses and SSH keys, sky local up will install necessary dependencies on the remote machines and configure SkyPilot to run jobs and services on the cluster.#

Note

Behind the scenes, SkyPilot deploys a lightweight Kubernetes cluster on the remote machines using k3s.

Note that no Kubernetes knowledge is required for running this guide. SkyPilot abstracts away the complexity of Kubernetes and provides a simple interface to run your jobs and services.

Prerequisites#

Local machine (typically your laptop):

Remote machines (your cluster, optionally with GPUs):

  • Debian-based OS (tested on Debian 11)

  • SSH access from local machine to all remote machines with key-based authentication

  • It’s recommended to use passwordless sudo for all remote machines. If passwordless sudo cannot be used, all machines must use the same password for the SSH username to use sudo.

  • All machines must use the same SSH key and username

  • All machines must have network access to each other

  • Port 6443 must be accessible on at least one node from your local machine

Deploying SkyPilot#

  1. Create a file ips.txt with the IP addresses of your machines with one IP per line. The first node will be used as the head node — this node must have port 6443 accessible from your local machine.

    Here is an example ips.txt file:

    192.168.1.1
    192.168.1.2
    192.168.1.3
    

    In this example, the first node (192.168.1.1) has port 6443 open and will be used as the head node.

  2. Run sky local up and pass the ips.txt file, SSH username, and SSH key as arguments:

    IP_FILE=ips.txt
    SSH_USER=username
    SSH_KEY=path/to/ssh/key
    CONTEXT_NAME=mycluster  # Optional, sets the context name in the kubeconfig. Defaults to "default".
    sky local up --ips $IP_FILE --ssh-user $SSH_USER --ssh-key-path $SSH_KEY --context-name $CONTEXT_NAME
    

    Tip

    If your cluster does not have passwordless sudo, specify the sudo password with the --password option:

    PASSWORD=password
    sky local up --ips $IP_FILE --ssh-user $SSH_USER --ssh-key-path $SSH_KEY --context-name $CONTEXT_NAME --password $PASSWORD
    

    SkyPilot will deploy a Kubernetes cluster on the remote machines, set up GPU support, configure Kubernetes credentials on your local machine, and set up SkyPilot to operate with the new cluster.

    Example output of sky local up:

    $ sky local up --ips ips.txt --ssh-user gcpuser --ssh-key-path ~/.ssh/id_rsa --context-name mycluster
    To view detailed progress: tail -n100 -f ~/sky_logs/sky-2024-09-23-18-53-14-165534/local_up.log
    ✔ K3s successfully deployed on head node.
    ✔ K3s successfully deployed on worker node.
    ✔ kubectl configured for the remote cluster.
    ✔ Remote k3s is running.
    ✔ Nvidia GPU Operator installed successfully.
    Cluster deployment done. You can now run tasks on this cluster.
    E.g., run a task with: sky launch --cloud kubernetes -- echo hello world.
    🎉 Remote cluster deployed successfully.
    
  1. To verify that the cluster is running, run:

    sky check kubernetes
    

    You can now use SkyPilot to launch your development clusters and training jobs on your own infrastructure.

    $ sky show-gpus --cloud k8s
    Kubernetes GPUs
    GPU   REQUESTABLE_QTY_PER_NODE  TOTAL_GPUS  TOTAL_FREE_GPUS
    L4    1, 2, 4                   12          12
    H100  1, 2, 4, 8                16          16
    
    Kubernetes per node GPU availability
    NODE_NAME                  GPU_NAME  TOTAL_GPUS  FREE_GPUS
    my-cluster-0               L4        4           4
    my-cluster-1               L4        4           4
    my-cluster-2               L4        2           2
    my-cluster-3               L4        2           2
    my-cluster-4               H100      8           8
    my-cluster-5               H100      8           8
    
    $ sky launch --cloud k8s --gpus H100:1 -- nvidia-smi
    

    Tip

    To enable shared access to a Kubernetes cluster, you can deploy a SkyPilot API server.

What happens behind the scenes?#

When you run sky local up, SkyPilot runs the following operations:

  1. Install and run k3s Kubernetes distribution as a systemd service on the remote machines.

  2. [If GPUs are present] Install Nvidia GPU Operator on the newly provisioned k3s cluster. Note that this step does not modify your local nvidia driver/cuda installation, and only runs inside the cluster.

  3. Expose the Kubernetes API server on the head node over port 6443. API calls are on this port are secured with a key pair generated by the cluster.

  4. Configure kubectl on your local machine to connect to the remote cluster.

Cleanup#

To clean up all state created by SkyPilot on your machines, use the --cleanup flag:

IP_FILE=ips.txt
SSH_USER=username
SSH_KEY=path/to/ssh/key
sky local up --ips $IP_FILE --ssh-user $SSH_USER --ssh-key-path $SSH_KEY --cleanup

Tip

If your cluster does not have passwordless sudo, specify the sudo password with the --password option:

PASSWORD=password
sky local up --ips $IP_FILE --ssh-user $SSH_USER --ssh-key-path $SSH_KEY --password $PASSWORD --cleanup

This will stop all Kubernetes services on the remote machines.

Setting up multiple clusters#

You can set up multiple Kubernetes clusters with SkyPilot by using different context-name values for each cluster:

# Set up first cluster and save the kubeconfig
sky local up --ips cluster1-ips.txt --ssh-user user1 --ssh-key-path key1.pem --context-name cluster1
# Set up second cluster
sky local up --ips cluster2-ips.txt --ssh-user user2 --ssh-key-path key2.pem --context-name cluster2

You can then configure SkyPilot to use multiple Kubernetes clusters by adding them to allowed_contexts in your ~/.sky/config.yaml file:

# ~/.sky/config.yaml
 allowed_contexts:
   - cluster1
   - cluster2
# Run on cluster1
sky launch --cloud k8s --region cluster1 -- echo "Running on cluster 1"

# Run on cluster2
sky launch --cloud k8s --region cluster2 -- echo "Running on cluster 2"

# Let SkyPilot automatically select the cluster with available resources
sky launch --cloud k8s -- echo "Running on SkyPilot selected cluster"

You can view the available clusters and GPUs using:

# List GPUs on cluster1
sky show-gpus --cloud k8s --region cluster1

# List GPUs on cluster2
sky show-gpus --cloud k8s --region cluster2