Kubernetes#

When running outside your Kubernetes cluster, SkyPilot uses your local ~/.kube/config file for authentication and creating resources on your Kubernetes cluster.

When running inside your Kubernetes cluster (e.g., as a remote API server, Job controller or Serve controller), SkyPilot can operate using either of the following three authentication methods:

  1. Automatically create a service account: SkyPilot can automatically create the service account and roles for itself to manage resources in the Kubernetes cluster. This is the default method when running inside the cluster, and no additional configuration is required.

    For details on the permissions that are granted to the service account, refer to the Minimum Permissions Required for SkyPilot section below.

  2. Using a custom service account: If you have a custom service account with the necessary permissions, you can configure SkyPilot to use it by adding this to your ~/.sky/config.yaml file:

    kubernetes:
      remote_identity: your-service-account-name
    
  3. Using your local kubeconfig file: In this case, SkyPilot will copy your local ~/.kube/config file to the controller pod and use it for authentication. To use this method, set remote_identity: LOCAL_CREDENTIALS to your Kubernetes configuration in the ~/.sky/config.yaml file:

    kubernetes:
      remote_identity: LOCAL_CREDENTIALS
    

    Note

    If your cluster uses exec based authentication in your ~/.kube/config file (e.g., GKE uses exec auth by default), SkyPilot may not be able to authenticate using this method. In this case, consider using the service account methods below.

Note

Service account based authentication applies only when the remote SkyPilot cluster (including spot and serve controller) is launched inside the Kubernetes cluster. When running outside the cluster (e.g., on AWS), SkyPilot will use the local ~/.kube/config file for authentication.

Below are the permissions required by SkyPilot and an example service account YAML that you can use to create a service account with the necessary permissions.

Minimum permissions required for SkyPilot#

SkyPilot requires permissions equivalent to the following roles to be able to manage the resources in the Kubernetes cluster:

# Namespaced role for the service account
# Required for creating pods, services and other necessary resources in the namespace.
# Note these permissions only apply in the namespace where SkyPilot is deployed, and the namespace can be changed below.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: sky-sa-role  # Can be changed if needed
  namespace: default  # Change to your namespace if using a different one.
rules:
  # Required for managing pods and their lifecycle
  - apiGroups: [ "" ]
    resources: [ "pods", "pods/status", "pods/exec", "pods/portforward" ]
    verbs: [ "*" ]
  # Required for managing services for SkyPilot Pods
  - apiGroups: [ "" ]
    resources: [ "services" ]
    verbs: [ "*" ]
  # Required for managing SSH keys
  - apiGroups: [ "" ]
    resources: [ "secrets" ]
    verbs: [ "*" ]
  # Required for retrieving reason when Pod scheduling fails.
  - apiGroups: [ "" ]
    resources: [ "events" ]
    verbs: [ "get", "list", "watch" ]
---
# ClusterRole for accessing cluster-wide resources. Details for each resource below:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: sky-sa-cluster-role  # Can be changed if needed
  namespace: default  # Change to your namespace if using a different one.
  labels:
    parent: skypilot
rules:
  # Required for getting node resources.
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "list", "watch"]
  # Required for autodetecting the runtime class of the nodes.
  - apiGroups: ["node.k8s.io"]
    resources: ["runtimeclasses"]
    verbs: ["get", "list", "watch"]
  # Required for accessing storage classes.
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]

Tip

If you are using a different namespace than default, make sure to change the namespace in the above manifests.

These roles must apply to both the user account configured in the kubeconfig file and the service account used by SkyPilot (if configured).

If you need to view real-time GPU availability with sky show-gpus, your tasks use object store mounting or your tasks require access to ingress resources, you will need to grant additional permissions as described below.

Permissions for sky show-gpus#

sky show-gpus needs to list all pods across all namespaces to calculate GPU availability. To do this, SkyPilot needs the get and list permissions for pods in a ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
    name: sky-sa-cluster-role-pod-reader
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list"]

Tip

If this role is not granted to the service account, sky show-gpus will still work but it will only show the total GPUs on the nodes, not the number of free GPUs.

Permissions for object store mounting#

If your tasks use object store mounting (e.g., S3, GCS, etc.), SkyPilot will need to run a DaemonSet to expose the FUSE device as a Kubernetes resource to SkyPilot pods.

To allow this, you will need to also create a skypilot-system namespace which will run the DaemonSet and grant the necessary permissions to the service account in that namespace.

# Required only if using object store mounting
# Create namespace for SkyPilot system
apiVersion: v1
kind: Namespace
metadata:
  name: skypilot-system  # Do not change this
  labels:
    parent: skypilot
---
# Role for the skypilot-system namespace to create fusermount-server and
# any other system components required by SkyPilot.
# This role must be bound in the skypilot-system namespace to the service account used for SkyPilot.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: skypilot-system-service-account-role  # Can be changed if needed
  namespace: skypilot-system  # Do not change this namespace
  labels:
    parent: skypilot
rules:
  - apiGroups: [ "*" ]
    resources: [ "apps" ]
    verbs: [ "daemonsets" ]

Permissions for using Ingress#

If your tasks use Ingress for exposing ports, you will need to grant the necessary permissions to the service account in the ingress-nginx namespace.

# Required only if using ingresses
# Role for accessing ingress service IP
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: ingress-nginx  # Do not change this
  name: sky-sa-role-ingress-nginx  # Can be changed if needed
rules:
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["list", "get"]

Example using custom service account#

To create a service account that has all necessary permissions for SkyPilot (including for accessing object stores), you can use the following YAML.

Tip

In this example, the service account is named sky-sa and is created in the default namespace. Change the namespace and service account name as needed.

  1 # create-sky-sa.yaml
  2 kind: ServiceAccount
  3 apiVersion: v1
  4 metadata:
  5   name: sky-sa  # Change to your service account name
  6   namespace: default  # Change to your namespace if using a different one.
  7   labels:
  8     parent: skypilot
  9 ---
 10 # Role for the service account
 11 kind: Role
 12 apiVersion: rbac.authorization.k8s.io/v1
 13 metadata:
 14   name: sky-sa-role  # Can be changed if needed
 15   namespace: default  # Change to your namespace if using a different one.
 16   labels:
 17     parent: skypilot
 18 rules:
 19   # Required for managing pods and their lifecycle
 20   - apiGroups: [ "" ]
 21     resources: [ "pods", "pods/status", "pods/exec", "pods/portforward" ]
 22     verbs: [ "*" ]
 23   # Required for managing services for SkyPilot Pods
 24   - apiGroups: [ "" ]
 25     resources: [ "services" ]
 26     verbs: [ "*" ]
 27   # Required for managing SSH keys
 28   - apiGroups: [ "" ]
 29     resources: [ "secrets" ]
 30     verbs: [ "*" ]
 31   # Required for retrieving reason when Pod scheduling fails.
 32   - apiGroups: [ "" ]
 33     resources: [ "events" ]
 34     verbs: [ "get", "list", "watch" ]
 35 ---
 36 # RoleBinding for the service account
 37 kind: RoleBinding
 38 apiVersion: rbac.authorization.k8s.io/v1
 39 metadata:
 40   name: sky-sa-rb  # Can be changed if needed
 41   namespace: default  # Change to your namespace if using a different one.
 42   labels:
 43     parent: skypilot
 44 subjects:
 45   - kind: ServiceAccount
 46     name: sky-sa  # Change to your service account name
 47 roleRef:
 48   kind: Role
 49   name: sky-sa-role  # Use the same name as the role at line 14
 50   apiGroup: rbac.authorization.k8s.io
 51 ---
 52 # ClusterRole for the service account
 53 kind: ClusterRole
 54 apiVersion: rbac.authorization.k8s.io/v1
 55 metadata:
 56   name: sky-sa-cluster-role  # Can be changed if needed
 57   namespace: default  # Change to your namespace if using a different one.
 58   labels:
 59     parent: skypilot
 60 rules:
 61   - apiGroups: [""]
 62     resources: ["nodes"]  # Required for getting node resources.
 63     verbs: ["get", "list", "watch"]
 64   - apiGroups: ["node.k8s.io"]
 65     resources: ["runtimeclasses"]   # Required for autodetecting the runtime class of the nodes.
 66     verbs: ["get", "list", "watch"]
 67   - apiGroups: ["networking.k8s.io"]   # Required for exposing services through ingresses
 68     resources: ["ingressclasses"]
 69     verbs: ["get", "list", "watch"]
 70   - apiGroups: [""]                 # Required for `sky show-gpus` command
 71     resources: ["pods"]
 72     verbs: ["get", "list"]
 73   - apiGroups: ["storage.k8s.io"]   # Required for using volumes
 74     resources: ["storageclasses"]
 75     verbs: ["get", "list", "watch"]
 76 ---
 77 # ClusterRoleBinding for the service account
 78 apiVersion: rbac.authorization.k8s.io/v1
 79 kind: ClusterRoleBinding
 80 metadata:
 81   name: sky-sa-cluster-role-binding  # Can be changed if needed
 82   namespace: default  # Change to your namespace if using a different one.
 83   labels:
 84     parent: skypilot
 85 subjects:
 86   - kind: ServiceAccount
 87     name: sky-sa  # Change to your service account name
 88     namespace: default  # Change to your namespace if using a different one.
 89 roleRef:
 90   kind: ClusterRole
 91   name: sky-sa-cluster-role  # Use the same name as the cluster role at line 43
 92   apiGroup: rbac.authorization.k8s.io
 93 ---
 94 # Optional: If using object store mounting, create the skypilot-system namespace
 95 apiVersion: v1
 96 kind: Namespace
 97 metadata:
 98   name: skypilot-system  # Do not change this
 99   labels:
100     parent: skypilot
101 ---
102 # Optional: If using object store mounting, create role in the skypilot-system
103 # namespace to create fusermount-server.
104 kind: Role
105 apiVersion: rbac.authorization.k8s.io/v1
106 metadata:
107   name: skypilot-system-service-account-role  # Can be changed if needed
108   namespace: skypilot-system  # Do not change this namespace
109   labels:
110     parent: skypilot
111 rules:
112   - apiGroups: [ "apps" ]
113     resources: [ "daemonsets" ]
114     verbs: [ "*" ]
115 ---
116 # Optional: If using object store mounting, create rolebinding in the skypilot-system
117 # namespace to create fusermount-server.
118 apiVersion: rbac.authorization.k8s.io/v1
119 kind: RoleBinding
120 metadata:
121   name: sky-sa-skypilot-system-role-binding
122   namespace: skypilot-system  # Do not change this namespace
123   labels:
124     parent: skypilot
125 subjects:
126   - kind: ServiceAccount
127     name: sky-sa  # Change to your service account name
128     namespace: default  # Change this to the namespace where the service account is created
129 roleRef:
130   kind: Role
131   name: skypilot-system-service-account-role  # Use the same name as the role above
132   apiGroup: rbac.authorization.k8s.io
133 ---
134 # Optional: Role for accessing ingress resources
135 apiVersion: rbac.authorization.k8s.io/v1
136 kind: Role
137 metadata:
138   name: sky-sa-role-ingress-nginx  # Can be changed if needed
139   namespace: ingress-nginx  # Do not change this namespace
140   labels:
141     parent: skypilot
142 rules:
143   - apiGroups: [""]
144     resources: ["services"]
145     verbs: ["list", "get", "watch"]
146 ---
147 # Optional: RoleBinding for accessing ingress resources
148 apiVersion: rbac.authorization.k8s.io/v1
149 kind: RoleBinding
150 metadata:
151   name: sky-sa-rolebinding-ingress-nginx  # Can be changed if needed
152   namespace: ingress-nginx  # Do not change this namespace
153   labels:
154     parent: skypilot
155 subjects:
156   - kind: ServiceAccount
157     name: sky-sa  # Change to your service account name
158     namespace: default  # Change this to the namespace where the service account is created
159 roleRef:
160   kind: Role
161   name: sky-sa-role-ingress-nginx  # Use the same name as the role above
162   apiGroup: rbac.authorization.k8s.io

Create the service account using the following command:

$ kubectl apply -f create-sky-sa.yaml

After creating the service account, the cluster admin may distribute kubeconfigs with the sky-sa service account to users who need to access the cluster.

Users should also configure SkyPilot to use the sky-sa service account through ~/.sky/config.yaml:

# ~/.sky/config.yaml
kubernetes:
  remote_identity: sky-sa   # Or your service account name