Deploying a SkyPilot API Server on GKE backed by GCP Cloud SQL#
In this example, a SkyPilot API server is deployed on a GKE cluster with a persistent database backed by GCP Cloud SQL.
SkyPilot API server deployed on k8s clusters can use either password authentication or IAM authentication to access the Cloud SQL instance.
Password authentication is the less secure option between the two, but it works with any k8s cluster (not necessarily GKE).
IAM authentication is a more secure authentication method that allows the GKE cluster to access the database using a GCP service account. However, it requires a GKE cluster with Workload Identity enabled.
Note
IAM authentication is recommended for production deployments on GKE clusters.
Prerequisites#
Access to GCP console
Create a GKE cluster#
Create a GKE cluster with Workload Identity enabled.
CLI:
Add
--enable-workload-identity
flag togcloud container clusters create
command as shown:gcloud container clusters create <cluster-name> \ ... --enable-workload-identityWeb Console:
When creating a standard GKE cluster, go to the
Security
tab and checkEnable Workload Identity
.Note
Retroactively enabling Workload Identity on an existing cluster is complicated and is not recommended.
Create a GKE cluster as you normally would.
Create a GCP service account to use with the API server#
In this step, a GCP service account is created to use with the API server.
Go to the Service Accounts page of the
IAM and Admin
consoleClick on “Create Service Account”
Set the service account name and ID to
skypilot-cloud-sql-access
Click on “Create and Continue” to move to the Permissions page.
Add
Cloud SQL Client
andCloud SQL Instance User
roles to the service account.Click on “Continue”, then “Done” to create the service account.
Go to the Service Accounts page of the
IAM and Admin
consoleClick on “Create Service Account”
Set the service account name and ID to
skypilot-cloud-sql-access
Click on “Create and Continue” to move to the Permissions page.
Add
Cloud SQL Client
role to the service account.Click on “Continue”, then “Done” to create the service account.
Create a cloud SQL instance#
Go to the Cloud SQL console
Click on “Create instance”
Select “PostgreSQL” as the database engine
Set the instance ID to
cloud-sql-skypilot-instance
Set the password for the
postgres
user.Select the region (and zone if applicable) where you want to create the instance. The region / zone of the database should match that of the GKE cluster.
Click on “Create Instance”
Configure the cloud SQL instance#
Once the instance is created, we need to configure the instance to create a user and a database for SkyPilot API server.
To create a database, use gcloud CLI to run the following command:
DB_NAME=skypilot-db
DB_INSTANCE_NAME=cloud-sql-skypilot-instance
gcloud sql databases create ${DB_NAME} --instance ${DB_INSTANCE_NAME}
To create a user, use gcloud CLI to run the following command:
GCP_PROJECT_ID=<your gcp project id> GCP_SERVICE_ACCOUNT=skypilot-cloud-sql-access DB_INSTANCE_NAME=cloud-sql-skypilot-instance gcloud sql users create ${GCP_SERVICE_ACCOUNT}@${GCP_PROJECT_ID}.iam \ --instance=${DB_INSTANCE_NAME} \ --type=cloud_iam_service_accountSince the service account user is not granted any privileges in the database by default, we need to grant the user the necessary privileges.
Go to the Cloud SQL console
Click on
cloud-sql-skypilot-instance
Click on
Cloud SQL Studio
tab on the side bar.Authenticate to
skypilot-db
database using thepostgres
user.Run the following SQL command to grant the user the necessary privileges:
GRANT "cloudsqlsuperuser" TO "skypilot-cloud-sql-access@<gcp-project-id>.iam"
DB_USER=skypilot DB_PASSWORD=<create a password> DB_INSTANCE_NAME=cloud-sql-skypilot-instance gcloud sql users create ${DB_USER} --instance ${DB_INSTANCE_NAME} --password ${DB_PASSWORD}
Create the database connection secret#
In this step, we create a secret to store the database connection information to be used by the API server.
NAMESPACE=skypilot DB_NAME=skypilot-db GCP_PROJECT_ID=<your gcp project id> kubectl create secret generic skypilot-db-connection-uri \ --namespace ${NAMESPACE} \ --from-literal connection_string="postgresql://localhost/${DB_NAME}?user=skypilot-cloud-sql-access%40${GCP_PROJECT_ID}.iam"
NAMESPACE=skypilot DB_USER=skypilot DB_PASSWORD=<password for the 'skypilot' user> DB_NAME=skypilot-db kubectl create secret generic skypilot-db-connection-uri \ --namespace ${NAMESPACE} \ --from-literal connection_string=postgresql://${DB_USER}:${DB_PASSWORD}@localhost/${DB_NAME}
Deploy the SkyPilot API server#
Replace <GCP_PROJECT_ID>
and <REGION>
in the following values.yaml
with the corresponding values.
values.yaml
:
apiService: dbConnectionSecretName: skypilot-db-connection-uri # config must be null when using an external database. # To set the config, use the web dashboard once the API server is deployed. config: null rbac: serviceAccountName: "skypilot-api-sa" serviceAccountAnnotations: # TODO: fill in <GCP_PROJECT_ID> iam.gke.io/gcp-service-account: skypilot-cloud-sql-access@<GCP_PROJECT_ID>.iam.gserviceaccount.com # Extra init containers to run before the api container extraInitContainers: - name: cloud-sql-proxy restartPolicy: Always # It is recommended to use the latest version of the Cloud SQL Auth Proxy # Make sure to update on a regular schedule! image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.14.1 args: # If connecting from a VPC-native GKE cluster, you can use the # following flag to have the proxy connect over private IP # - "--private-ip" # If you are not connecting with Automatic IAM, you can delete # the following flag. - "--auto-iam-authn" # Enable structured logging with LogEntry format: - "--structured-logs" # Replace DB_PORT with the port the proxy should listen on - "--port=5432" # TODO: fill in <GCP_PROJECT_ID> and <REGION> - "<GCP_PROJECT_ID>:<REGION>:cloud-sql-skypilot-instance" securityContext: # The default Cloud SQL Auth Proxy image runs as the # "nonroot" user and group (uid: 65532) by default. runAsNonRoot: true # You should use resource requests/limits as a best practice to prevent # pods from consuming too many resources and affecting the execution of # other pods. You should adjust the following values based on what your # application needs. For details, see # https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ resources: requests: # The proxy's memory use scales linearly with the number of active # connections. Fewer open connections will use less memory. Adjust # this value based on your application's requirements. memory: "2Gi" # The proxy's CPU use scales linearly with the amount of IO between # the database and the application. Adjust this value based on your # application's requirements. cpu: "1"
apiService: extraVolumes: - name: cloud-sql-credentials secret: secretName: cloud-sql-credentials dbConnectionSecretName: skypilot-db-connection-uri # config must be null when using an external database. # To set the config, use the web dashboard once the API server is deployed. config: null # Extra init containers to run before the api container extraInitContainers: - name: cloud-sql-proxy restartPolicy: Always # It is recommended to use the latest version of the Cloud SQL Auth Proxy # Make sure to update on a regular schedule! image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.14.1 args: # If connecting from a VPC-native GKE cluster, you can use the # following flag to have the proxy connect over private IP # - "--private-ip" # Use service account key file for authentication - "--credentials-file=/var/secrets/google/service-account-key.json" # Enable structured logging with LogEntry format: - "--structured-logs" # Replace DB_PORT with the port the proxy should listen on - "--port=5432" # TODO: fill in <GCP_PROJECT_ID> and <REGION> - "<GCP_PROJECT_ID>:<REGION>:cloud-sql-skypilot-instance" securityContext: # The default Cloud SQL Auth Proxy image runs as the # "nonroot" user and group (uid: 65532) by default. runAsNonRoot: true # You should use resource requests/limits as a best practice to prevent # pods from consuming too many resources and affecting the execution of # other pods. You should adjust the following values based on what your # application needs. For details, see # https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ resources: requests: # The proxy's memory use scales linearly with the number of active # connections. Fewer open connections will use less memory. Adjust # this value based on your application's requirements. memory: "2Gi" # The proxy's CPU use scales linearly with the amount of IO between # the database and the application. Adjust this value based on your # application's requirements. cpu: "1" volumeMounts: - name: cloud-sql-credentials mountPath: /var/secrets/google readOnly: true
Then run the following command to deploy the API server using helm:
NAMESPACE=skypilot
RELEASE_NAME=skypilot
WEB_USERNAME=skypilot
WEB_PASSWORD=<create a password>
AUTH_STRING=$(htpasswd -nb $WEB_USERNAME $WEB_PASSWORD)
helm upgrade --install $RELEASE_NAME skypilot/skypilot-nightly --devel \
--namespace $NAMESPACE \
-f values.yaml \
--set ingress.authCredentials=$AUTH_STRING