Frequently Asked Questions#
Git and GitHub#
How to clone private GitHub repositories in a task’s setup
commands?#
This is possible provided you have set up SSH agent forwarding. For example, run the following on your laptop:
eval $(ssh-agent -s)
ssh-add ~/.ssh/id_rsa
Then, any SkyPilot clusters launched from this machine would be able to clone private GitHub repositories. For example:
# your_task.yaml
setup: |
git clone [email protected]:your-proj/your-repo.git
Note: currently, cloning private repositories in the run
commands is not supported yet.
How to ensure my workdir’s .git
is synced up for managed spot jobs?#
Currently, there is a difference in whether .git
is synced up depending on the command used:
For regular
sky launch
, the workdir’s.git
is synced up by default.For managed spot jobs
sky spot launch
, the workdir’s.git
is excluded by default.
In the second case, to ensure the workdir’s .git
is synced up for managed spot jobs, you can explicitly add a file mount to sync it up:
workdir: .
file_mounts:
~/sky_workdir/.git: .git
This can be useful if your jobs use certain experiment tracking tools that depend on the .git
directory to track code changes.
File mounting (file_mounts
)#
How to make SkyPilot clusters use my Weights & Biases credentials?#
Install the wandb library on your laptop and login to your account via wandb login
.
Then, add the following lines in your task yaml file:
file_mounts:
~/.netrc: ~/.netrc
How to mount additional files into a cloned repository?#
If you want to mount additional files into a path that will be git clone
-ed (either in setup
or run
), cloning will fail and complain that the target path is not empty:
file_mounts:
~/code-repo/tmp.txt: ~/tmp.txt
setup: |
# Fail! Git will complain the target dir is not empty:
# fatal: destination path 'code-repo' already exists and is not an empty directory.
# This is because file_mounts are processed before `setup`.
git clone [email protected]:your-id/your-repo.git ~/code-repo/
To get around this, mount the files to a different path, then symlink to them. For example:
file_mounts:
/tmp/tmp.txt: ~/tmp.txt
setup: |
git clone [email protected]:your-id/your-repo.git ~/code-repo/
ln -s /tmp/tmp.txt ~/code-repo/
How to update an existing cluster’s file_mounts
without rerunning setup
?#
If you have edited the file_mounts
section (e.g., by adding some files) and would like to have it reflected on an existing cluster, running sky launch -c <cluster> ..
would work, but it would rerun the setup
commands.
To avoid rerunning the setup
commands, pass the --no-setup
flag to sky launch
.
Region settings#
How to launch VMs in a subset of regions only (e.g., Europe only)?#
When defining a task, you can use the resources.any_of
field to specify a set of regions you want to launch VMs in.
For example, to launch VMs in Europe only (which can help with GDPR compliance), you can use the following task definition:
resources:
# SkyPilot will perform cost optimization among the specified regions.
any_of:
# AWS:
- region: eu-central-1
- region: eu-west-1
- region: eu-west-2
- region: eu-west-3
- region: eu-north-1
# GCP:
- region: europe-central2
- region: europe-north1
- region: europe-southwest1
- region: europe-west1
- region: europe-west10
- region: europe-west12
- region: europe-west2
- region: europe-west3
- region: europe-west4
- region: europe-west6
- region: europe-west8
- region: europe-west9
# Or put in other clouds' Europe regions.
See more details about the resources.any_of
field here.
(Advanced) How to make SkyPilot use all global regions?#
By default, SkyPilot supports most global regions on AWS and only supports the US regions on GCP and Azure. If you want to utilize all global regions, please run the following command:
version=$(python -c 'import sky; print(sky.clouds.service_catalog.constants.CATALOG_SCHEMA_VERSION)')
mkdir -p ~/.sky/catalogs/${version}
cd ~/.sky/catalogs/${version}
# GCP
pip install lxml
# Fetch U.S. regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp
# Fetch the specified zones for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --zones northamerica-northeast1-a us-east1-b us-east1-c
# Fetch U.S. zones for GCP, excluding the specified zones
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --exclude us-east1-a us-east1-b
# Fetch all regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --all-regions
# Run in single-threaded mode. This is useful when multiple processes don't work well with the GCP client due to SSL issues.
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --single-threaded
# Azure
# Fetch U.S. regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure
# Fetch all regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --all-regions
# Run in single-threaded mode. This is useful when multiple processes don't work well with the Azure client due to SSL issues.
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --single-threaded
# Fetch the specified regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --regions japaneast australiaeast uksouth
# Fetch U.S. regions for Azure, excluding the specified regions
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --exclude centralus eastus
To make your managed spot jobs potentially use all global regions, please log into the spot controller with ssh sky-spot-controller-<hash>
(the full name can be found in sky status
), and run the commands above.
(Advanced) How to edit or update the regions or pricing information used by SkyPilot?#
SkyPilot stores regions and pricing information for different cloud resource types in CSV files known as
“service catalogs”.
These catalogs are cached in the ~/.sky/catalogs/<schema-version>/
directory.
Check out your schema version by running the following command:
python -c "from sky.clouds import service_catalog; print(service_catalog.CATALOG_SCHEMA_VERSION)"
You can customize the catalog files to your needs.
For example, if you have access to special regions of GCP, add the data to ~/.sky/catalogs/<schema-version>/gcp.csv
.
Also, you can update the catalog for a specific cloud by deleting the CSV file (e.g., rm ~/.sky/catalogs/<schema-version>/gcp.csv
).
SkyPilot will automatically download the latest catalog in the next run.
Miscellaneous#
How can I launch a VS Code tunnel using a SkyPilot task definition?#
To launch a VS Code tunnel using a SkyPilot task definition, you can use the following task definition:
setup: |
sudo snap install --classic code
# if `snap` is not available, you can try the following commands instead:
# wget https://go.microsoft.com/fwlink/?LinkID=760868 -O vscode.deb
# sudo apt install ./vscode.deb -y
# rm vscode.deb
run: |
code tunnel --accept-server-license-terms
Note that you’ll be prompted to authenticate with your GitHub account to launch a VS Code tunnel.
PyTorch 2.2.0 failed on SkyPilot clusters. What should I do?#
The latest PyTorch release (2.2.0) has a version conflict with the default cuDNN version on SkyPilot clusters, which may raise a segmentation fault when you run the job.
To fix this, you can choose one of the following solutions:
Use older version of PyTorch (like 2.1.0) instead of 2.2.0, i.e.
pip install "torch<2.2"
;Remove the cuDNN from the cluster’s
LD_LIBRARY_PATH
by adding the following line to your task:
run: |
export LD_LIBRARY_PATH=$(echo $LD_LIBRARY_PATH | sed 's|:/usr/local/cuda/lib64||g; s|/usr/local/cuda/lib64:||g; s|/usr/local/cuda/lib64||g')
# Other commands using PyTorch 2.2.0
...