Frequently Asked Questions#

Can I clone private GitHub repositories in a task’s setup commands?#

Yes, provided you have set up SSH agent forwarding. For example, run the following on your laptop:

eval $(ssh-agent -s)
ssh-add ~/.ssh/id_rsa

Then, any SkyPilot clusters launched from this machine would be able to clone private GitHub repositories. For example:

# your_task.yaml
setup: |
  git clone [email protected]:your-proj/your-repo.git

Note: currently, cloning private repositories in the run commands is not supported yet.

How to mount additional files into a cloned repository?#

If you want to mount additional files into a path that will be git clone-ed (either in setup or run), cloning will fail and complain that the target path is not empty:

file_mounts:
  ~/code-repo/tmp.txt: ~/tmp.txt
setup: |
  # Fail! Git will complain the target dir is not empty:
  #    fatal: destination path 'code-repo' already exists and is not an empty directory.
  # This is because file_mounts are processed before `setup`.
  git clone [email protected]:your-id/your-repo.git ~/code-repo/

To get around this, mount the files to a different path, then symlink to them. For example:

file_mounts:
  /tmp/tmp.txt: ~/tmp.txt
setup: |
  git clone [email protected]:your-id/your-repo.git ~/code-repo/
  ln -s /tmp/tmp.txt ~/code-repo/

How to make SkyPilot clusters use my Weights & Biases credentials?#

Install the wandb library on your laptop and login to your account via wandb login. Then, add the following lines in your task yaml file:

file_mounts:
  ~/.netrc: ~/.netrc

How to update an existing cluster’s file_mounts without rerunning setup?#

If you have edited the file_mounts section (e.g., by adding some files) and would like to have it reflected on an existing cluster, running sky launch -c <cluster> .. would work, but it would rerun the setup commands.

To avoid rerunning the setup commands, pass the --no-setup flag to sky launch.

How can I launch a VS Code tunnel using a SkyPilot task definition?#

To launch a VS Code tunnel using a SkyPilot task definition, you can use the following task definition:

setup: |
  sudo snap install --classic code
  # if `snap` is not available, you can try the following commands instead:
  # wget https://go.microsoft.com/fwlink/?LinkID=760868 -O vscode.deb
  # sudo apt install ./vscode.deb -y
  # rm vscode.deb
run: |
  code tunnel --accept-server-license-terms

Note that you’ll be prompted to authenticate with your GitHub account to launch a VS Code tunnel.

How to launch VMs in a subset of regions only (e.g., Europe only)?#

When defining a task, you can use the resources.any_of field to specify a set of regions you want to launch VMs in.

For example, to launch VMs in Europe only (which can help with GDPR compliance), you can use the following task definition:

resources:
  # SkyPilot will perform cost optimization among the specified regions.
  any_of:
    # AWS:
    - region: eu-central-1
    - region: eu-west-1
    - region: eu-west-2
    - region: eu-west-3
    - region: eu-north-1
    # GCP:
    - region: europe-central2
    - region: europe-north1
    - region: europe-southwest1
    - region: europe-west1
    - region: europe-west10
    - region: europe-west12
    - region: europe-west2
    - region: europe-west3
    - region: europe-west4
    - region: europe-west6
    - region: europe-west8
    - region: europe-west9
    # Or put in other clouds' Europe regions.

See more details about the resources.any_of field here.

PyTorch 2.2.0 failed on SkyPilot clusters. What should I do?#

The latest PyTorch release (2.2.0) has a version conflict with the default cuDNN version on SkyPilot clusters, which may raise a segmentation fault when you run the job.

To fix this, you can choose one of the following solutions:

  1. Use older version of PyTorch (like 2.1.0) instead of 2.2.0, i.e. pip install "torch<2.2";

  2. Remove the cuDNN from the cluster’s LD_LIBRARY_PATH by adding the following line to your task:

run: |
  export LD_LIBRARY_PATH=$(echo $LD_LIBRARY_PATH | sed 's|:/usr/local/cuda/lib64||g; s|/usr/local/cuda/lib64:||g; s|/usr/local/cuda/lib64||g')
  # Other commands using PyTorch 2.2.0
  ...

(Advanced) How to make SkyPilot use all global regions?#

By default, SkyPilot supports most global regions on AWS and only supports the US regions on GCP and Azure. If you want to utilize all global regions, please run the following command:

version=$(python -c 'import sky; print(sky.clouds.service_catalog.constants.CATALOG_SCHEMA_VERSION)')
mkdir -p ~/.sky/catalogs/${version}
cd ~/.sky/catalogs/${version}
# GCP
pip install lxml
# Fetch U.S. regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp
# Fetch the specified zones for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --zones northamerica-northeast1-a us-east1-b us-east1-c
# Fetch U.S. zones for GCP, excluding the specified zones
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --exclude us-east1-a us-east1-b
# Fetch all regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --all-regions
# Run in single-threaded mode. This is useful when multiple processes don't work well with the GCP client due to SSL issues.
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --single-threaded

# Azure
# Fetch U.S. regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure
# Fetch all regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --all-regions
# Run in single-threaded mode. This is useful when multiple processes don't work well with the Azure client due to SSL issues.
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --single-threaded
# Fetch the specified regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --regions japaneast australiaeast uksouth
# Fetch U.S. regions for Azure, excluding the specified regions
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --exclude centralus eastus

To make your managed spot jobs potentially use all global regions, please log into the spot controller with ssh sky-spot-controller-<hash> (the full name can be found in sky status), and run the commands above.

(Advanced) How to edit or update the regions or pricing information used by SkyPilot?#

SkyPilot stores regions and pricing information for different cloud resource types in CSV files known as “service catalogs”. These catalogs are cached in the ~/.sky/catalogs/<schema-version>/ directory. Check out your schema version by running the following command:

python -c "from sky.clouds import service_catalog; print(service_catalog.CATALOG_SCHEMA_VERSION)"

You can customize the catalog files to your needs. For example, if you have access to special regions of GCP, add the data to ~/.sky/catalogs/<schema-version>/gcp.csv. Also, you can update the catalog for a specific cloud by deleting the CSV file (e.g., rm ~/.sky/catalogs/<schema-version>/gcp.csv). SkyPilot will automatically download the latest catalog in the next run.