Deploying SkyPilot API Server#

The SkyPilot API server is packaged as a Helm chart which deploys a Kubernetes ingress controller and the API server.

Tip

This guide is for admins to deploy the API server. If you are a user looking to connect to the API server, refer to Connecting to an API server.

Prerequisites#

  • A Kubernetes cluster with ports 30050 and 30051 available for NodePort services

  • Helm

  • kubectl

Tip

If you do not have a Kubernetes cluster, refer to Kubernetes Deployment Guides to set one up.

You can also deploy the API server on cloud VMs using an existing SkyPilot installation. See Alternative: Deploy on cloud VMs.

Step 1: Create a namespace and add Helm repository#

The API server will be deployed in a namespace of your choice. You can either create the namespace manually:

$ NAMESPACE=skypilot
$ kubectl create namespace $NAMESPACE

Or let Helm create it automatically by adding the --create-namespace flag to the helm install command in Step 3.

Next, add the SkyPilot Helm repository:

$ helm repo add skypilot https://helm.skypilot.co
$ helm repo update

Step 2: Configure cloud accounts#

Following tabs describe how to configure credentials for different clouds on the API server. All cloud credentials are stored in Kubernetes secrets.

By default, SkyPilot will automatically use the same Kubernetes cluster as the API server:

  • To disable this behavior, set kubernetesCredentials.useApiServerCluster=false in the Helm chart values.

  • When running in the same cluster, tasks are launched in the same namespace as the API server. To use a different namespace for tasks, set kubernetesCredentials.inclusterNamespace=<namespace> when deploying the API server.

To use a kubeconfig file to authenticate to other clusters, first create a Kubernetes secret with the kubeconfig file:

$ NAMESPACE=skypilot
$ kubectl create secret generic kube-credentials \
$   -n $NAMESPACE \
$   --from-file=config=~/.kube/config

Once the secret is created, set kubernetesCredentials.useKubeconfig=true and kubernetesCredentials.kubeconfigSecretName in the Helm chart values to use the kubeconfig file for authentication:

$ helm upgrade --install skypilot skypilot/skypilot-nightly --devel \
$   --set kubernetesCredentials.useKubeconfig=true \
$   --set kubernetesCredentials.kubeconfigSecretName=kube-credentials \
$   --set kubernetesCredentials.useApiServerCluster=true

Tip

If you are using a kubeconfig file that contains exec-based authentication (e.g., GKE’s default gke-gcloud-auth-plugin based authentication), you will need to strip the path information from the command field in the exec configuration. You can use the exec_kubeconfig_converter.py script to do this.

$ python -m sky.utils.kubernetes.exec_kubeconfig_converter --input ~/.kube/config --output ~/.kube/config.converted

Then create the Kubernetes secret with the converted kubeconfig file ~/.kube/config.converted.

Tip

To use multiple Kubernetes clusters from the config file, you will need to add the context names to allowed_contexts in the SkyPilot config file. See Setting the SkyPilot Config on how to set the config file.

You can also set both useKubeconfig and useApiServerCluster at the same time to configure the API server to use an external Kubernetes cluster in addition to the API server’s own cluster.

Make sure you have the access key id and secret access key.

Create a Kubernetes secret with your AWS credentials:

$ NAMESPACE=skypilot
$ kubectl create secret generic aws-credentials \
$   -n $NAMESPACE \
$   --from-literal=aws_access_key_id=YOUR_ACCESS_KEY_ID \
$   --from-literal=aws_secret_access_key=YOUR_SECRET_ACCESS_KEY

Replace YOUR_ACCESS_KEY_ID and YOUR_SECRET_ACCESS_KEY with your actual AWS credentials.

When installing or upgrading the Helm chart, enable AWS credentials by setting awsCredentials.enabled=true.

$ helm upgrade --install skypilot skypilot/skypilot-nightly --devel --set awsCredentials.enabled=true

We use service accounts to authenticate with GCP. Refer to GCP service account guide on how to set up a service account.

Once you have the JSON key for your service account, create a Kubernetes secret to store it:

$ NAMESPACE=skypilot
$ kubectl create secret generic gcp-credentials \
$   -n $NAMESPACE \
$   --from-file=gcp-cred.json=YOUR_SERVICE_ACCOUNT_JSON_KEY.json

When installing or upgrading the Helm chart, enable GCP credentials by setting gcpCredentials.enabled=true and gcpCredentials.projectId to your project ID:

$ helm upgrade --install skypilot skypilot/skypilot-nightly --devel \
$   --set gcpCredentials.enabled=true \
$   --set gcpCredentials.projectId=YOUR_PROJECT_ID

Replace YOUR_PROJECT_ID with your actual GCP project ID.

You can manually configure the credentials for other clouds by kubectl exec into the API server pod after it is deployed and running the relevant installation commands.

Note that manually configured credentials will not be persisted across API server restarts.

Support for configuring other clouds through secrets is coming soon!

Step 3: Deploy the API Server Helm Chart#

Install the SkyPilot Helm chart with the following command:

$ NAMESPACE=skypilot
$ WEB_USERNAME=skypilot
$ WEB_PASSWORD=yourpassword
$ AUTH_STRING=$(htpasswd -nb $WEB_USERNAME $WEB_PASSWORD)
$ helm upgrade --install skypilot skypilot/skypilot-nightly --devel \
$   --namespace $NAMESPACE \
$   --create-namespace \
$   --set ingress.authCredentials=$AUTH_STRING

The --namespace flag specifies which namespace to deploy the API server in, and --create-namespace will create the namespace if it doesn’t exist.

To install a specific version, pass the --version flag to the helm upgrade command (e.g., --version 0.1.0).

If you configured any cloud credentials in the previous step, make sure to enable them by adding the relevant flags (e.g., --set awsCredentials.enabled=true) to the command.

Tip

You can configure the password for the API server with the WEB_PASSWORD variable.

Tip

If you already have a Kubernetes secret containing basic auth credentials, you can use it directly by setting ingress.authSecret instead of ingress.authCredentials:

$ helm upgrade --install skypilot skypilot/skypilot-nightly --devel \
$   --namespace $NAMESPACE \
$   --create-namespace \
$   --set ingress.authSecret=my-existing-auth-secret

The secret must be in the same namespace as the API server and must contain a key named auth with the basic auth credentials in htpasswd format.

Step 4: Get the API server URL#

Once the API server is deployed, we can fetch the API server URL. We use nginx ingress to expose the API server.

Our default of using a NodePort service is the recommended way to expose the API server because some cloud load balancers (e.g., GKE) do not work with websocket connections, which are required for our Kubernetes SSH tunneling.

  1. Make sure ports 30050 and 30051 are open on your nodes.

  2. Fetch the ingress controller URL with:

$ RELEASE_NAME=skypilot  # This should match the name used in helm install/upgrade
$ NODE_PORT=$(kubectl get svc ${RELEASE_NAME}-ingress-controller-np -n $NAMESPACE -o jsonpath='{.spec.ports[?(@.name=="http")].nodePort}')
$ NODE_IP=$(kubectl get nodes -o jsonpath='{ $.items[0].status.addresses[?(@.type=="ExternalIP")].address }')
$ ENDPOINT=http://${WEB_USERNAME}:${WEB_PASSWORD}@${NODE_IP}:${NODE_PORT}
$ echo $ENDPOINT
http://skypilot:[email protected]:30050

Tip

You can customize the node ports with --set ingress.httpNodePort=<port> --set ingress.httpsNodePort=<port> to the helm upgrade command.

If set to null, Kubernetes will assign random ports in the NodePort range (default 30000-32767). Make sure to open these ports on your nodes.

Tip

To avoid frequent IP address changes on nodes by your cloud provider, you can attach a static IP address to your nodes (instructions for GKE) and use it as the NODE_IP in the command above.

Warning

Using LoadBalancer service type may not support SSH access to SkyPilot clusters. Only use this option if you do not need SSH access.

  1. Deploy the API server with LoadBalancer configuration:

$ helm upgrade --install skypilot skypilot/skypilot-nightly --devel \
$   --set ingress.httpNodePort=null \
$   --set ingress.httpsNodePort=null \
$   --set ingress-nginx.controller.service.type=LoadBalancer
  1. Fetch the ingress controller URL:

$ RELEASE_NAME=skypilot  # This should match the name used in helm install/upgrade
$ ENDPOINT=$(kubectl get svc ${RELEASE_NAME}-ingress-nginx-controller -n $NAMESPACE -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}')
$ echo $ENDPOINT
http://1.1.1.1

Step 5: Test the API server#

Test the API server by curling the health endpoint:

$ curl ${ENDPOINT}/api/health
{"status":"healthy","api_version":"1","commit":"ba7542c6dcd08484d83145d3e63ec9966d5909f3-dirty","version":"1.0.0-dev0"}

If all looks good, you can now start using the API server. Refer to Connecting to an API server to connect your local SkyPilot client to the API server.

Updating the API server#

To update the API server, update your repositories with helm repo update and run the same helm upgrade command as in the installation step.

Uninstall#

To uninstall the API server, run:

$ helm uninstall skypilot -n skypilot

This will delete the API server and all associated resources.

Other Notes#

Fault Tolerance and State Persistence#

The skypilot API server is designed to be fault tolerant. If the API server pod is terminated, the Kubernetes will automatically create a new pod to replace it.

To retain state during pod termination, we use a persistent volume claim. The persistent volume claim is backed by a PersistentVolume that is created by the Helm chart.

You can customize the storage settings using the following values by creating a values.yaml file:

storage:
  # Enable/disable persistent storage
  enabled: true
  # Storage class name - leave empty to use cluster default
  storageClassName: ""
  # Access modes - ReadWriteOnce or ReadWriteMany depending on storage class support
  accessMode: ReadWriteOnce
  # Storage size
  size: 10Gi
  # Optional selector for matching specific PVs
  selector: {}
    # matchLabels:
    #   environment: prod
  # Optional volume name for binding to specific PV
  volumeName: ""
  # Optional annotations
  annotations: {}

For example, to use a specific storage class and increase the storage size:

# values.yaml
storage:
  enabled: true
  storageClassName: "standard"
  size: 20Gi

Apply the configuration using:

$ helm upgrade --install skypilot skypilot/skypilot-nightly --devel -f values.yaml

Additional setup for EKS#

To support persistent storage for the API server’s state, we need a storage class that supports persistent volumes. If you already have a storage class that supports persistent volumes, you can skip the following steps.

We will use the Amazon EBS CSI driver to create a storage class that supports persistent volumes backed by Amazon EBS. You can also use other storage classes that support persistent volumes, such as EFS.

The steps below are based on the official documentation. Please follow the official documentation to adapt the steps to your cluster.

  1. Make sure OIDC is enabled for your cluster. Follow the steps here.

    1. You will need to create and bind an IAM role which has permissions to create EBS volumes. See instructions here.

  2. Install the Amazon EBS CSI driver. The recommended method is through creating an EKS add-on.

Once the EBS CSI driver is installed, the default gp2 storage class will be backed by EBS volumes.

Setting the SkyPilot Config#

The Helm chart supports setting the global SkyPilot config YAML file on the API server. The config file is mounted as ~/.sky/config.yaml in the API server container.

To set the config file, pass --set-file apiService.config=path/to/your/config.yaml to the helm command:

# Create the config.yaml file
$ cat <<EOF > config.yaml
$ admin_policy: admin_policy_examples.AddLabelsPolicy
$
$ jobs:
$   controller:
$     resources:
$         cpus: 2+
$
$ allowed_clouds:
$   - aws
$   - kubernetes
$
$ kubernetes:
$   allowed_contexts:
$     - my-context
$     - my-other-context
$ EOF

# Install the API server with the config file
$ helm upgrade --install skypilot skypilot/skypilot-nightly --devel \
$   --set-file apiService.config=config.yaml

You can also directly set config values in the values.yaml file.

Setting an Admin Policy#

The Helm chart supports installing an admin policy before the API server starts.

To do so, set apiService.preDeployHook to the commands you want to run. For example, to install an admin policy, create a values.yaml file with the following:

# values.yaml
apiService:
  preDeployHook: |
   echo "Installing admin policy"
   pip install git+https://github.com/michaelvll/admin-policy-examples

  config: |
    admin_policy: admin_policy_examples.AddLabelsPolicy

Then apply the values.yaml file using the -f flag when running the helm upgrade command:

$ helm upgrade --install skypilot skypilot/skypilot-nightly --devel -f values.yaml

Alternative: Deploy on cloud VMs#

You can also deploy the API server directly on cloud VMs using an existing SkyPilot installation.

Step 1: Use SkyPilot to deploy the API server on a cloud VM#

Write the SkyPilot API server YAML file and use sky launch to deploy the API server:

# Write the YAML to a file
$ cat <<EOF > skypilot-api-server.yaml
$ resources:
$     cpus: 8+
$     memory: 16+
$     ports: 46580
$     image_id: docker:berkeleyskypilot/skypilot-nightly:latest

$ run: |
$   sky api start --deploy
$ EOF

# Deploy the API server
$ sky launch -c api-server skypilot-api-server.yaml

Step 2: Get the API server URL#

Once the API server is deployed, you can fetch the API server URL with:

$ sky status --endpoint 46580 api-server
http://a.b.c.d:46580

Test the API server by curling the health endpoint:

$ curl ${ENDPOINT}/health
SkyPilot API Server: Healthy

If all looks good, you can now start using the API server. Refer to Connecting to an API server to connect your local SkyPilot client to the API server.

Note

API server deployment using the above YAML does not have any authentication by default. We recommend adding a authentication layer (e.g., nginx reverse proxy) or using the SkyPilot helm chart on a Kubernetes cluster for a more secure deployment.

Tip

If you are installing SkyPilot API client in the same environment, we recommend using a different python environment (venv, conda, etc.) to avoid conflicts with the SkyPilot installation used to deploy the API server.