Kubernetes#
When running outside your Kubernetes cluster, SkyPilot uses your local ~/.kube/config
file
for authentication and creating resources on your Kubernetes cluster.
When running inside your Kubernetes cluster (e.g., as a Spot controller or Serve controller), SkyPilot can operate using either of the following three authentication methods:
Automatically create a service account: SkyPilot can automatically create the service account and roles for itself to manage resources in the Kubernetes cluster. This is the default method when running inside the cluster, and no additional configuration is required.
For details on the permissions that are granted to the service account, refer to the Minimum Permissions Required for SkyPilot section below.
Using a custom service account: If you have a custom service account with the necessary permissions, you can configure SkyPilot to use it by adding this to your ~/.sky/config.yaml file:
kubernetes: remote_identity: your-service-account-name
Using your local kubeconfig file: In this case, SkyPilot will copy your local
~/.kube/config
file to the controller pod and use it for authentication. To use this method, setremote_identity: LOCAL_CREDENTIALS
to your Kubernetes configuration in the ~/.sky/config.yaml file:kubernetes: remote_identity: LOCAL_CREDENTIALS
Note
If your cluster uses exec based authentication in your
~/.kube/config
file (e.g., GKE uses exec auth by default), SkyPilot may not be able to authenticate using this method. In this case, consider using the service account methods below.
Note
Service account based authentication applies only when the remote SkyPilot
cluster (including spot and serve controller) is launched inside the
Kubernetes cluster. When running outside the cluster (e.g., on AWS),
SkyPilot will use the local ~/.kube/config
file for authentication.
Below are the permissions required by SkyPilot and an example service account YAML that you can use to create a service account with the necessary permissions.
Minimum Permissions Required for SkyPilot#
SkyPilot requires permissions equivalent to the following roles to be able to manage the resources in the Kubernetes cluster:
# Namespaced role for the service account
# Required for creating pods, services and other necessary resources in the namespace.
# Note these permissions only apply in the namespace where SkyPilot is deployed, and the namespace can be changed below.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-role # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
---
# ClusterRole for accessing cluster-wide resources. Details for each resource below:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-cluster-role # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
labels:
parent: skypilot
rules:
- apiGroups: [""]
resources: ["nodes"] # Required for getting node resources.
verbs: ["get", "list", "watch"]
- apiGroups: ["node.k8s.io"]
resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
verbs: ["get", "list", "watch"]
Tip
If you are using a different namespace than default
, make sure to change the namespace in the above manifests.
These roles must apply to both the user account configured in the kubeconfig file and the service account used by SkyPilot (if configured).
If you need to view real-time GPU availability with sky show-gpus
, your tasks use object store mounting or your tasks require access to ingress resources, you will need to grant additional permissions as described below.
Permissions for sky show-gpus
#
sky show-gpus
needs to list all pods across all namespaces to calculate GPU availability. To do this, SkyPilot needs the get
and list
permissions for pods in a ClusterRole
:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: sky-sa-cluster-role-pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
Tip
If this role is not granted to the service account, sky show-gpus
will still work but it will only show the total GPUs on the nodes, not the number of free GPUs.
Permissions for Object Store Mounting#
If your tasks use object store mounting (e.g., S3, GCS, etc.), SkyPilot will need to run a DaemonSet to expose the FUSE device as a Kubernetes resource to SkyPilot pods.
To allow this, you will need to also create a skypilot-system
namespace which will run the DaemonSet and grant the necessary permissions to the service account in that namespace.
# Required only if using object store mounting
# Create namespace for SkyPilot system
apiVersion: v1
kind: Namespace
metadata:
name: skypilot-system # Do not change this
labels:
parent: skypilot
---
# Role for the skypilot-system namespace to create FUSE device manager and
# any other system components required by SkyPilot.
# This role must be bound in the skypilot-system namespace to the service account used for SkyPilot.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: skypilot-system-service-account-role # Can be changed if needed
namespace: skypilot-system # Do not change this namespace
labels:
parent: skypilot
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
Permissions for using Ingress#
If your tasks use Ingress for exposing ports, you will need to grant the necessary permissions to the service account in the ingress-nginx
namespace.
# Required only if using ingresses
# Role for accessing ingress service IP
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: ingress-nginx # Do not change this
name: sky-sa-role-ingress-nginx # Can be changed if needed
rules:
- apiGroups: [""]
resources: ["services"]
verbs: ["list", "get"]
Example using Custom Service Account#
To create a service account that has all necessary permissions for SkyPilot (including for accessing object stores), you can use the following YAML.
Tip
In this example, the service account is named sky-sa
and is created in the default
namespace.
Change the namespace and service account name as needed.
1 # create-sky-sa.yaml
2 kind: ServiceAccount
3 apiVersion: v1
4 metadata:
5 name: sky-sa # Change to your service account name
6 namespace: default # Change to your namespace if using a different one.
7 labels:
8 parent: skypilot
9 ---
10 # Role for the service account
11 kind: Role
12 apiVersion: rbac.authorization.k8s.io/v1
13 metadata:
14 name: sky-sa-role # Can be changed if needed
15 namespace: default # Change to your namespace if using a different one.
16 labels:
17 parent: skypilot
18 rules:
19 - apiGroups: ["*"] # Required for creating pods, services, secrets and other necessary resources in the namespace.
20 resources: ["*"]
21 verbs: ["*"]
22 ---
23 # RoleBinding for the service account
24 kind: RoleBinding
25 apiVersion: rbac.authorization.k8s.io/v1
26 metadata:
27 name: sky-sa-rb # Can be changed if needed
28 namespace: default # Change to your namespace if using a different one.
29 labels:
30 parent: skypilot
31 subjects:
32 - kind: ServiceAccount
33 name: sky-sa # Change to your service account name
34 roleRef:
35 kind: Role
36 name: sky-sa-role # Use the same name as the role at line 14
37 apiGroup: rbac.authorization.k8s.io
38 ---
39 # ClusterRole for the service account
40 kind: ClusterRole
41 apiVersion: rbac.authorization.k8s.io/v1
42 metadata:
43 name: sky-sa-cluster-role # Can be changed if needed
44 namespace: default # Change to your namespace if using a different one.
45 labels:
46 parent: skypilot
47 rules:
48 - apiGroups: [""]
49 resources: ["nodes"] # Required for getting node resources.
50 verbs: ["get", "list", "watch"]
51 - apiGroups: ["node.k8s.io"]
52 resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
53 verbs: ["get", "list", "watch"]
54 - apiGroups: ["networking.k8s.io"] # Required for exposing services through ingresses
55 resources: ["ingressclasses"]
56 verbs: ["get", "list", "watch"]
57 - apiGroups: [""] # Required for `sky show-gpus` command
58 resources: ["pods"]
59 verbs: ["get", "list"]
60 ---
61 # ClusterRoleBinding for the service account
62 apiVersion: rbac.authorization.k8s.io/v1
63 kind: ClusterRoleBinding
64 metadata:
65 name: sky-sa-cluster-role-binding # Can be changed if needed
66 namespace: default # Change to your namespace if using a different one.
67 labels:
68 parent: skypilot
69 subjects:
70 - kind: ServiceAccount
71 name: sky-sa # Change to your service account name
72 namespace: default # Change to your namespace if using a different one.
73 roleRef:
74 kind: ClusterRole
75 name: sky-sa-cluster-role # Use the same name as the cluster role at line 43
76 apiGroup: rbac.authorization.k8s.io
77 ---
78 # Optional: If using object store mounting, create the skypilot-system namespace
79 apiVersion: v1
80 kind: Namespace
81 metadata:
82 name: skypilot-system # Do not change this
83 labels:
84 parent: skypilot
85 ---
86 # Optional: If using object store mounting, create role in the skypilot-system
87 # namespace to create FUSE device manager.
88 kind: Role
89 apiVersion: rbac.authorization.k8s.io/v1
90 metadata:
91 name: skypilot-system-service-account-role # Can be changed if needed
92 namespace: skypilot-system # Do not change this namespace
93 labels:
94 parent: skypilot
95 rules:
96 - apiGroups: ["*"]
97 resources: ["*"]
98 verbs: ["*"]
99 ---
100 # Optional: If using object store mounting, create rolebinding in the skypilot-system
101 # namespace to create FUSE device manager.
102 apiVersion: rbac.authorization.k8s.io/v1
103 kind: RoleBinding
104 metadata:
105 name: sky-sa-skypilot-system-role-binding
106 namespace: skypilot-system # Do not change this namespace
107 labels:
108 parent: skypilot
109 subjects:
110 - kind: ServiceAccount
111 name: sky-sa # Change to your service account name
112 namespace: default # Change this to the namespace where the service account is created
113 roleRef:
114 kind: Role
115 name: skypilot-system-service-account-role # Use the same name as the role at line 88
116 apiGroup: rbac.authorization.k8s.io
117 ---
118 # Optional: Role for accessing ingress resources
119 apiVersion: rbac.authorization.k8s.io/v1
120 kind: Role
121 metadata:
122 name: sky-sa-role-ingress-nginx # Can be changed if needed
123 namespace: ingress-nginx # Do not change this namespace
124 labels:
125 parent: skypilot
126 rules:
127 - apiGroups: [""]
128 resources: ["services"]
129 verbs: ["list", "get", "watch"]
130 - apiGroups: ["rbac.authorization.k8s.io"]
131 resources: ["roles", "rolebindings"]
132 verbs: ["list", "get", "watch"]
133 ---
134 # Optional: RoleBinding for accessing ingress resources
135 apiVersion: rbac.authorization.k8s.io/v1
136 kind: RoleBinding
137 metadata:
138 name: sky-sa-rolebinding-ingress-nginx # Can be changed if needed
139 namespace: ingress-nginx # Do not change this namespace
140 labels:
141 parent: skypilot
142 subjects:
143 - kind: ServiceAccount
144 name: sky-sa # Change to your service account name
145 namespace: default # Change this to the namespace where the service account is created
146 roleRef:
147 kind: Role
148 name: sky-sa-role-ingress-nginx # Use the same name as the role at line 119
149 apiGroup: rbac.authorization.k8s.io
Create the service account using the following command:
$ kubectl apply -f create-sky-sa.yaml
After creating the service account, the cluster admin may distribute kubeconfigs with the sky-sa
service account to users who need to access the cluster.
Users should also configure SkyPilot to use the sky-sa
service account through ~/.sky/config.yaml
:
# ~/.sky/config.yaml
kubernetes:
remote_identity: sky-sa # Or your service account name