GCP#
Note
By default, SkyPilot will pick up the credentials you have set up. For most cases, the installation instructions are all you need to do. The steps below are optional advanced configuration options, aimed primarily at cloud admins and advanced users.
Service account#
GCP service accounts are supported.
Tip
A service account on your local machine can avoid the periodic
google.auth.exceptions.RefreshError: Reauthentication is needed. Please
run `gcloud auth application-default login` to reauthenticate.
error. A
service account is long-lived as it does not have an expiry time.
Set up a service account as follows:
Follow the instructions to create a service account with the appropriate roles/permissions.
In the “Service Accounts” tab in the IAM & Admin console, click on the service account to go to its detailed page. Click on the KEYS tab, then click on ADD KEY to add a JSON key. The key will be downloaded automatically.
Set the environment variable
GOOGLE_APPLICATION_CREDENTIALS
to the path of the key file, and configure the gcloud CLI tool:$ export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json $ gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS $ gcloud config set project your-project-id
You may want to add the export statement in your profile (e.g.
~/.bashrc
,~/.zshrc
) so that it is set automatically in all new terminal sessions.
Setting permissions#
Generally, the administrator can choose among three “levels” of permissions, from the most permissive and least setup effort, to the least permissive and more setup effort:
Default: no setup, give users Owner-level permissions (i.e., you do not need to follow the instructions in this section)
Medium: easy setup, with a medium set of permissions
Minimal: more setup, with the minimal set of permissions
Medium permissions#
The easiest way to grant permissions to a user access your GCP project without the Owner
role is to add the following roles to the user principals:
roles/browser
roles/compute.admin
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountUser
roles/serviceusage.serviceUsageConsumer
roles/storage.admin
roles/iam.securityAdmin
Note
If the roles/iam.securityAdmin
role is undesirable, you can do the following. First, include the role and have any user (e.g., the admin) run sky launch --cloud gcp
successfully once. This is to create the necessary service account. Then, replace the role roles/iam.securityAdmin
with roles/iam.roleViewer
in the list above.
Optionally, to use TPUs, add the following role:
roles/tpu.admin
You can grant those accesses via GCP’s IAM & Admin console.
Minimal permissions#
The Medium Permissions assigns admin permissions for some GCP services to the user. If you would like to grant finer-grained and more minimal permissions to your users in your organization / project, you can create a custom role by following the steps below:
Go to GCP’s IAM & Admin console and click on Create Role.

Give the role a descriptive name, such as
minimal-skypilot-role
.Click Add Permissions and search for the following permissions and add them to the role:
compute.disks.create
compute.disks.list
compute.firewalls.create
compute.firewalls.delete
compute.firewalls.get
compute.instances.create
compute.instances.delete
compute.instances.get
compute.instances.list
compute.instances.setLabels
compute.instances.setMetadata
compute.instances.setServiceAccount
compute.instances.start
compute.instances.stop
compute.networks.get
compute.networks.list
compute.networks.getEffectiveFirewalls
compute.globalOperations.get
compute.subnetworks.use
compute.subnetworks.list
compute.subnetworks.useExternalIp
compute.projects.get
compute.zoneOperations.get
iam.roles.get
iam.serviceAccounts.actAs
iam.serviceAccounts.get
serviceusage.services.enable
serviceusage.services.list
serviceusage.services.use
resourcemanager.projects.get
resourcemanager.projects.getIamPolicy
Note
For custom VPC users (with gcp.vpc_name
specified in ~/.sky/config.yaml
, check here), compute.firewalls.create
and compute.firewalls.delete
are not necessary unless opening ports is needed via resources.ports in task yaml.
Note
(Advanced) To further limit the iam.serviceAccounts.actAs
permission to access SkyPilot’s service account only, you can remove the permission from the list above and additionally grant your organization’s users the ability to use the service account skypilot-v1
created by the admin (see Service Account). This can be done by going to IAM & Admin console -> Service Accounts -> skypilot-v1 -> Permissions -> GRANT ACCESS
and adding the users with role roles/iam.serviceAccountUser
. This permits the users to use the skypilot-v1
service account required by SkyPilot.
Optional: If the user needs to access GCS buckets, you can additionally add the following permissions:
storage.buckets.create
storage.buckets.get
storage.buckets.delete
storage.objects.create
storage.objects.update
storage.objects.delete
storage.objects.get
storage.objects.list
Optional: If the user needs to access TPU VMs, you can additionally add the following permissions (the following may not be exhaustive, please file an issue if you find any missing permissions):
tpu.nodes.create
tpu.nodes.delete
tpu.nodes.list
tpu.nodes.get
tpu.nodes.update
tpu.operations.get
Optional: To enable
sky launch --clone-disk-from
, you need to have the following permissions for the role as well:
compute.disks.useReadOnly
compute.images.create
compute.images.get
compute.images.delete
Optional: To enable opening ports on GCP cluster, you need to have the following permissions for the role as well:
compute.instances.setTags
compute.firewalls.list
compute.firewalls.update
Optional: If the user needs to use custom machine images with
sky launch --image-id
, you can additionally add the following permissions:
compute.disks.get
compute.disks.resize
compute.images.get
compute.images.useReadOnly
Optional: If your organization sets
gcp.prioritize_reservations
orgcp.specific_reservations
in ~/.sky/config.yaml, you can additionally add the following permissions:
compute.reservations.list
Click Create to create the role.
Go back to the “IAM” tab and click on GRANT ACCESS.
Fill in the email address of the user in the “Add principals” section, and select
minimal-skypilot-role
in the “Assign roles” section. Click Save.

The user should receive an invitation to the project and should be able to setup SkyPilot by following the instructions in Installation.
Note
The user created with the above minimal permissions will not be able to create service accounts to be assigned to SkyPilot instances.
The admin needs to follow the instruction below to create a service account to be shared by all users in the project.
Service account for SkyPilot system#
Note
If you already have an service account under “Service Accounts” tab with the email starting with skypilot-v1@
, it is likely created by SkyPilot automatically, and you can skip this section.
Click the “Service Accounts” tab in the IAM & Admin console, and click on CREATE SERVICE ACCOUNT.

Set the service account id to
skypilot-v1
and click CREATE AND CONTINUE.

3. Select the minimal-skypilot-role
(or the name you set) created in the
last section and click on DONE. You can also choose to use the Default or
Medium Permissions roles as described in the previous sections.

Firewall rules#
By default, users do not need to set up any special firewall rules to start
using SkyPilot. If the default VPC does not satisfy the minimal required rules,
a new VPC skypilot-vpc
with sufficient rules will be automatically created
and used.
However, if you manually set up and instruct SkyPilot to use a custom VPC (see below), ensure it has the following required firewall rules:
# Allow internal connections between SkyPilot VMs:
#
# controller -> head node of a cluster
# head node of a cluster <-> worker node(s) of a cluster
#
# NOTE: these ports are more relaxed than absolute minimum, but the
# sourceRanges restrict the traffic to internal IPs.
{
"direction": "INGRESS",
"allowed": [
{"IPProtocol": "tcp", "ports": ["0-65535"]},
{"IPProtocol": "udp", "ports": ["0-65535"]},
],
"sourceRanges": ["10.128.0.0/9"],
},
# Allow SSH connections from user machine(s)
#
# NOTE: This can be satisfied using the following relaxed sourceRanges
# (0.0.0.0/0), but you can customize it if you want to restrict to certain
# known public IPs (useful when using internal VPN or proxy solutions).
{
"direction": "INGRESS",
"allowed": [
{"IPProtocol": "tcp", "ports": ["22"]},
],
"sourceRanges": ["0.0.0.0/0"],
},
You can inspect and manage firewall rules at
https://console.cloud.google.com/net-security/firewall-manager/firewall-policies/list?project=<your-project-id>
or using any of GCP’s SDKs.
Using a specific VPC#
By default, SkyPilot uses the following behavior to get a VPC to use for all GCP instances:
First, all existing VPCs in the project are checked against the minimal recommended firewall rules for SkyPilot to function. If any VPC satisfies these rules, it is used.
Otherwise, a new VPC named
skypilot-vpc
is automatically created with the minimal recommended firewall rules and will be used. It is an auto mode VPC that automatically starts with one subnet per region.
To instruct SkyPilot to use a specific VPC, you can use SkyPilot’s global config
file ~/.sky/config.yaml
to specify the VPC name in the gcp.vpc_name
field:
gcp:
vpc_name: my-vpc-name
See details in Advanced Configuration. Example use cases include using a private VPC or a VPC with fine-grained constraints, typically created via Terraform or manually.
The custom VPC should contain the required firewall rules.
Using internal IPs#
For security reason, users may only want to use internal IPs for SkyPilot instances.
To do so, you can use SkyPilot’s global config file ~/.sky/config.yaml
to specify the gcp.use_internal_ips
and gcp.ssh_proxy_command
fields (to see the detailed syntax, see Advanced Configuration):
gcp:
use_internal_ips: true
# VPC with NAT setup, see below
vpc_name: my-vpc-name
ssh_proxy_command: ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]
The gcp.ssh_proxy_command
field is optional. If SkyPilot is run on a machine that can directly access the internal IPs of the instances, it can be omitted. Otherwise, it should be set to a command that can be used to proxy SSH connections to the internal IPs of the instances.
Here are two ways to enable SkyPilot’s access those private hosts:
Option 1 – SSH proxy / bastion host
Option 2 – IAP tunneling (Identity‑Aware Proxy) This is often simpler because you do not need to run or secure your own bastion; Google handles the tunnel at the edge.
Choose whichever fits your environment.
Option 1: Simple single‑proxy example#
If you have a “jump host” that’s directly accessible from the machine running SkyPilot, tell SkyPilot how to pipe SSH through it:
gcp:
use_internal_ips: true
vpc_name: my-vpc-name
ssh_proxy_command: ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]
When multiple regions are involved you can map each region to its own proxy (just as before):
gcp:
use_internal_ips: true
vpc_name: my-vpc-name
ssh_proxy_command:
us-west1: ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]
us-east1: ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]
Option 2: IAP tunneling#
IAP TCP forwarding creates an authenticated tunnel from your workstation to any private IP inside a GCP VPC—no public IP, VPN, or self-managed bastion required. SkyPilot implements the IP-based (destination-group) mode; the per-instance mode is not supported yet.
Setup#
Enable the API once per project:
$ gcloud services enable iap.googleapis.com
This method requires IAP tunnel permissions. Adding the built-in role
roles/iap.tunnelResourceAccessor
on the destination group (or project-wide) is the simplest approach, in addition to the other permissions required by SkyPilot.A Cloud NAT gateway must exist in every region where you launch internal-only VMs so they can reach the internet for package installs. See Cloud NAT setup.
The following script can be used to set up IAP tunneling in a specific VPC region:
Automated setup script (per region)
#!/usr/bin/env bash
# Set your region and VPC.
REGION=us-east1 # e.g. us-west1, europe-west4, ...
VPC=default # change if using a custom VPC
#######################################################################
# 1. Cloud NAT (if none exists) #
#######################################################################
nat_exists=false
for r in $(gcloud compute routers list --regions="$REGION" \
--format='value(name)'); do
if gcloud compute routers nats list --router="$r" --region="$REGION" \
--format='value(name)' | grep -q .; then
nat_exists=true
fi
done
if [ "$nat_exists" = false ]; then
gcloud compute routers create nat-router --network="$VPC" --region="$REGION"
gcloud compute routers nats create nat-config \
--router=nat-router --router-region="$REGION" \
--nat-all-subnet-ip-ranges --auto-allocate-nat-external-ips
fi
#######################################################################
# 2. IAP destination group covering the subnet CIDR #
#######################################################################
CIDR=$(gcloud compute networks subnets list --network="$VPC" \
--regions="$REGION" --format='value(ipCidrRange)')
gcloud iap tcp dest-groups create "internal-vpc-$VPC" \
--region="$REGION" --ip-range-list="$CIDR"
#######################################################################
# 3. Show the SkyPilot config snippet #
#######################################################################
echo "Update your config.yaml to add the new gcp.ssh_proxy_command.$REGION value:"
echo
cat <<EOF
gcp:
use_internal_ips: true
ssh_proxy_command:
$REGION: gcloud compute start-iap-tunnel %h %p --listen-on-stdin --region=$REGION --network=$VPC --dest-group=internal-vpc-$VPC
EOF
# Automatically do it
read -p 'Automatically update ~/.sky/config.yaml? [y/N] ' choice
if echo "$choice" | grep -xqE '[yY]'; then
yq -Yi ".gcp.use_internal_ips=true | .gcp.ssh_proxy_command[\"$REGION\"]=\"gcloud compute start-iap-tunnel %h %p --listen-on-stdin --region=$REGION --network=$VPC --dest-group=internal-vpc-$VPC\"" ~/.sky/config.yaml
else
echo 'Aborting.'
fi
Paste the printed snippet into your ~/.sky/config.yaml
(or let the script
modify it automatically).
Once the destination group is created, make sure to set the ssh_proxy_command
in the SkyPilot config. For instance, if you have
destination group:
internal-vpc-default
region:
us-east1
VPC:
default
you could use this configuration:
gcp:
use_internal_ips: true
ssh_proxy_command:
us-east1: gcloud compute start-iap-tunnel %h %p --listen-on-stdin --region=us-east1 --network=default --dest-group=internal-vpc-default
How it works#
sky launch
creates VMs without external IPs.When connecting to the instance,
ssh
uses the IAP proxy command shown above.gcloud compute start-iap-tunnel
transparently forwards the connection through Google’s edge to the VM’s private address.SSH packets travel over the authenticated IAP tunnel—no public ingress is ever exposed.
Troubleshooting#
Permission denied / 403 when opening the tunnel → the caller is missing
roles/iap.tunnelResourceAccessor
(or an equivalent custom role) on the destination group.VM cannot reach the Internet → ensure the Cloud NAT gateway is in the same region as the subnet and that
--nat-all-subnet-ip-ranges
is enabled.Multiple subnets or VPCs → create one destination group per region × CIDR and map each region to the correct group in
ssh_proxy_command
.
Cloud NAT setup#
Instances created with internal IPs only on GCP cannot access public internet by default. To make sure SkyPilot can install the dependencies correctly on the instances, cloud NAT needs to be setup for the VPC (see GCP’s documentation for details).
Cloud NAT is a regional resource, so it will need to be created in each region that SkyPilot will be used in.

To limit SkyPilot to use some specific regions only, you can specify the gcp.ssh_proxy_command
to be a dict mapping from region to the SSH proxy command for that region (see Advanced Configuration for details):
gcp:
use_internal_ips: true
vpc_name: my-vpc-name
ssh_proxy_command:
us-west1: ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]
us-east1: ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]
If proxy is not needed, but the regions need to be limited, you can set the gcp.ssh_proxy_command
to be a dict mapping from region to null
:
gcp:
use_internal_ips: true
vpc_name: my-vpc-name
ssh_proxy_command:
us-west1: null
us-east1: null
Force enable external IPs#
An alternative to setting up cloud NAT for instances that need to access the public internet but are in a VPC and communicated with via their internal IP is to force them to be created with an external IP address.
gcp:
use_internal_ips: true
vpc_name: my-vpc-name
force_enable_external_ips: true