Frequently Asked Questions#

Git and GitHub#

How to clone private GitHub repositories in a task’s setup commands?#

This is possible provided you have set up SSH agent forwarding. For example, run the following on your laptop:

eval $(ssh-agent -s)
ssh-add ~/.ssh/id_rsa

Then, any SkyPilot clusters launched from this machine would be able to clone private GitHub repositories. For example:

# your_task.yaml
setup: |
  git clone [email protected]:your-proj/your-repo.git

Note: currently, cloning private repositories in the run commands is not supported yet.

How to ensure my workdir’s .git is synced up for managed spot jobs?#

Currently, there is a difference in whether .git is synced up depending on the command used:

  • For regular sky launch, the workdir’s .git is synced up by default.

  • For managed spot jobs sky spot launch, the workdir’s .git is excluded by default.

In the second case, to ensure the workdir’s .git is synced up for managed spot jobs, you can explicitly add a file mount to sync it up:

workdir: .
file_mounts:
  ~/sky_workdir/.git: .git

This can be useful if your jobs use certain experiment tracking tools that depend on the .git directory to track code changes.

File mounting (file_mounts)#

How to make SkyPilot clusters use my Weights & Biases credentials?#

Install the wandb library on your laptop and login to your account via wandb login. Then, add the following lines in your task yaml file:

file_mounts:
  ~/.netrc: ~/.netrc

How to mount additional files into a cloned repository?#

If you want to mount additional files into a path that will be git clone-ed (either in setup or run), cloning will fail and complain that the target path is not empty:

file_mounts:
  ~/code-repo/tmp.txt: ~/tmp.txt
setup: |
  # Fail! Git will complain the target dir is not empty:
  #    fatal: destination path 'code-repo' already exists and is not an empty directory.
  # This is because file_mounts are processed before `setup`.
  git clone [email protected]:your-id/your-repo.git ~/code-repo/

To get around this, mount the files to a different path, then symlink to them. For example:

file_mounts:
  /tmp/tmp.txt: ~/tmp.txt
setup: |
  git clone [email protected]:your-id/your-repo.git ~/code-repo/
  ln -s /tmp/tmp.txt ~/code-repo/

How to update an existing cluster’s file_mounts without rerunning setup?#

If you have edited the file_mounts section (e.g., by adding some files) and would like to have it reflected on an existing cluster, running sky launch -c <cluster> .. would work, but it would rerun the setup commands.

To avoid rerunning the setup commands, pass the --no-setup flag to sky launch.

Region settings#

How to launch VMs in a subset of regions only (e.g., Europe only)?#

When defining a task, you can use the resources.any_of field to specify a set of regions you want to launch VMs in.

For example, to launch VMs in Europe only (which can help with GDPR compliance), you can use the following task definition:

resources:
  # SkyPilot will perform cost optimization among the specified regions.
  any_of:
    # AWS:
    - region: eu-central-1
    - region: eu-west-1
    - region: eu-west-2
    - region: eu-west-3
    - region: eu-north-1
    # GCP:
    - region: europe-central2
    - region: europe-north1
    - region: europe-southwest1
    - region: europe-west1
    - region: europe-west10
    - region: europe-west12
    - region: europe-west2
    - region: europe-west3
    - region: europe-west4
    - region: europe-west6
    - region: europe-west8
    - region: europe-west9
    # Or put in other clouds' Europe regions.

See more details about the resources.any_of field here.

(Advanced) How to make SkyPilot use all global regions?#

By default, SkyPilot supports most global regions on AWS and only supports the US regions on GCP and Azure. If you want to utilize all global regions, please run the following command:

version=$(python -c 'import sky; print(sky.clouds.service_catalog.constants.CATALOG_SCHEMA_VERSION)')
mkdir -p ~/.sky/catalogs/${version}
cd ~/.sky/catalogs/${version}
# GCP
pip install lxml
# Fetch U.S. regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp
# Fetch the specified zones for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --zones northamerica-northeast1-a us-east1-b us-east1-c
# Fetch U.S. zones for GCP, excluding the specified zones
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --exclude us-east1-a us-east1-b
# Fetch all regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --all-regions
# Run in single-threaded mode. This is useful when multiple processes don't work well with the GCP client due to SSL issues.
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --single-threaded

# Azure
# Fetch U.S. regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure
# Fetch all regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --all-regions
# Run in single-threaded mode. This is useful when multiple processes don't work well with the Azure client due to SSL issues.
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --single-threaded
# Fetch the specified regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --regions japaneast australiaeast uksouth
# Fetch U.S. regions for Azure, excluding the specified regions
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --exclude centralus eastus

To make your managed spot jobs potentially use all global regions, please log into the spot controller with ssh sky-spot-controller-<hash> (the full name can be found in sky status), and run the commands above.

(Advanced) How to edit or update the regions or pricing information used by SkyPilot?#

SkyPilot stores regions and pricing information for different cloud resource types in CSV files known as “service catalogs”. These catalogs are cached in the ~/.sky/catalogs/<schema-version>/ directory. Check out your schema version by running the following command:

python -c "from sky.clouds import service_catalog; print(service_catalog.CATALOG_SCHEMA_VERSION)"

You can customize the catalog files to your needs. For example, if you have access to special regions of GCP, add the data to ~/.sky/catalogs/<schema-version>/gcp.csv. Also, you can update the catalog for a specific cloud by deleting the CSV file (e.g., rm ~/.sky/catalogs/<schema-version>/gcp.csv). SkyPilot will automatically download the latest catalog in the next run.

Miscellaneous#

How can I launch a VS Code tunnel using a SkyPilot task definition?#

To launch a VS Code tunnel using a SkyPilot task definition, you can use the following task definition:

setup: |
  sudo snap install --classic code
  # if `snap` is not available, you can try the following commands instead:
  # wget https://go.microsoft.com/fwlink/?LinkID=760868 -O vscode.deb
  # sudo apt install ./vscode.deb -y
  # rm vscode.deb
run: |
  code tunnel --accept-server-license-terms

Note that you’ll be prompted to authenticate with your GitHub account to launch a VS Code tunnel.

PyTorch 2.2.0 failed on SkyPilot clusters. What should I do?#

The latest PyTorch release (2.2.0) has a version conflict with the default cuDNN version on SkyPilot clusters, which may raise a segmentation fault when you run the job.

To fix this, you can choose one of the following solutions:

  1. Use older version of PyTorch (like 2.1.0) instead of 2.2.0, i.e. pip install "torch<2.2";

  2. Remove the cuDNN from the cluster’s LD_LIBRARY_PATH by adding the following line to your task:

run: |
  export LD_LIBRARY_PATH=$(echo $LD_LIBRARY_PATH | sed 's|:/usr/local/cuda/lib64||g; s|/usr/local/cuda/lib64:||g; s|/usr/local/cuda/lib64||g')
  # Other commands using PyTorch 2.2.0
  ...