Frequently Asked Questions#
Git and GitHub#
How to clone private GitHub repositories in a task’s setup
commands?#
This is possible provided you have set up SSH agent forwarding. For example, run the following on your laptop:
eval $(ssh-agent -s)
ssh-add ~/.ssh/id_rsa
Then, any SkyPilot clusters launched from this machine would be able to clone private GitHub repositories. For example:
# your_task.yaml
setup: |
git clone [email protected]:your-proj/your-repo.git
Note: currently, cloning private repositories in the run
commands is not supported yet.
How to ensure my workdir’s .git
is synced up for managed spot jobs?#
Currently, there is a difference in whether .git
is synced up depending on the command used:
For regular
sky launch
, the workdir’s.git
is synced up by default.For managed jobs
sky jobs launch
, the workdir’s.git
is excluded by default.
In the second case, to ensure the workdir’s .git
is synced up for managed spot jobs, you can explicitly add a file mount to sync it up:
workdir: .
file_mounts:
~/sky_workdir/.git: .git
This can be useful if your jobs use certain experiment tracking tools that depend on the .git
directory to track code changes.
File mounting (file_mounts
)#
How to make SkyPilot clusters use my Weights & Biases credentials?#
Install the wandb library on your laptop and login to your account via wandb login
.
Then, add the following lines in your task yaml file:
file_mounts:
~/.netrc: ~/.netrc
How to mount additional files into a cloned repository?#
If you want to mount additional files into a path that will be git clone
-ed (either in setup
or run
), cloning will fail and complain that the target path is not empty:
file_mounts:
~/code-repo/tmp.txt: ~/tmp.txt
setup: |
# Fail! Git will complain the target dir is not empty:
# fatal: destination path 'code-repo' already exists and is not an empty directory.
# This is because file_mounts are processed before `setup`.
git clone [email protected]:your-id/your-repo.git ~/code-repo/
To get around this, mount the files to a different path, then symlink to them. For example:
file_mounts:
/tmp/tmp.txt: ~/tmp.txt
setup: |
git clone [email protected]:your-id/your-repo.git ~/code-repo/
ln -s /tmp/tmp.txt ~/code-repo/
How to update an existing cluster’s file_mounts
without rerunning setup
?#
If you have edited the file_mounts
section (e.g., by adding some files) and would like to have it reflected on an existing cluster, running sky launch -c <cluster> ..
would work, but it would rerun the setup
commands.
To avoid rerunning the setup
commands, pass the --no-setup
flag to sky launch
.
Region settings#
How to launch VMs in a subset of regions only (e.g., Europe only)?#
When defining a task, you can use the resources.any_of
field to specify a set of regions you want to launch VMs in.
For example, to launch VMs in Europe only (which can help with GDPR compliance), you can use the following task definition:
resources:
# SkyPilot will perform cost optimization among the specified regions.
any_of:
# AWS:
- region: eu-central-1
- region: eu-west-1
- region: eu-west-2
- region: eu-west-3
- region: eu-north-1
# GCP:
- region: europe-central2
- region: europe-north1
- region: europe-southwest1
- region: europe-west1
- region: europe-west10
- region: europe-west12
- region: europe-west2
- region: europe-west3
- region: europe-west4
- region: europe-west6
- region: europe-west8
- region: europe-west9
# Or put in other clouds' Europe regions.
See more details about the resources.any_of
field here.
(Advanced) How to make SkyPilot use all global regions?#
By default, SkyPilot supports most global regions on AWS and only supports the US regions on GCP and Azure. If you want to utilize all global regions, please run the following command:
version=$(python -c 'import sky; print(sky.clouds.service_catalog.constants.CATALOG_SCHEMA_VERSION)')
mkdir -p ~/.sky/catalogs/${version}
cd ~/.sky/catalogs/${version}
# GCP
pip install lxml
# Fetch U.S. regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp
# Fetch the specified zones for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --zones northamerica-northeast1-a us-east1-b us-east1-c
# Fetch U.S. zones for GCP, excluding the specified zones
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --exclude us-east1-a us-east1-b
# Fetch all regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --all-regions
# Run in single-threaded mode. This is useful when multiple processes don't work well with the GCP client due to SSL issues.
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --single-threaded
# Azure
# Fetch U.S. regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure
# Fetch all regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --all-regions
# Run in single-threaded mode. This is useful when multiple processes don't work well with the Azure client due to SSL issues.
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --single-threaded
# Fetch the specified regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --regions japaneast australiaeast uksouth
# Fetch U.S. regions for Azure, excluding the specified regions
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --exclude centralus eastus
To make your managed spot jobs potentially use all global regions, please log into the spot controller with ssh sky-spot-controller-<hash>
(the full name can be found in sky status
), and run the commands above.
(Advanced) How to edit or update the regions or pricing information used by SkyPilot?#
SkyPilot stores regions and pricing information for different cloud resource types in CSV files known as
“service catalogs”.
These catalogs are cached in the ~/.sky/catalogs/<schema-version>/
directory.
Check out your schema version by running the following command:
python -c "from sky.clouds import service_catalog; print(service_catalog.CATALOG_SCHEMA_VERSION)"
You can customize the catalog files to your needs.
For example, if you have access to special regions of GCP, add the data to ~/.sky/catalogs/<schema-version>/gcp.csv
.
Also, you can update the catalog for a specific cloud by deleting the CSV file (e.g., rm ~/.sky/catalogs/<schema-version>/gcp.csv
).
SkyPilot will automatically download the latest catalog in the next run.
Package Installation#
Unable to import PyTorch in a SkyPilot task.#
For PyTorch installation, if you are using the default SkyPilot images (not passing in –image-id), pip install torch
should work.
But if you use your own image which has an older NVIDIA driver (535.161.08 or lower) and you install the default PyTorch, you may encounter the following error:
ImportError: /home/azureuser/miniconda3/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
You will need to install a PyTorch version that is compatible with your NVIDIA driver, e.g., pip install torch --index-url https://download.pytorch.org/whl/cu121
.
Miscellaneous#
How can I launch a VS Code tunnel using a SkyPilot task definition?#
To launch a VS Code tunnel using a SkyPilot task definition, you can use the following task definition:
setup: |
sudo snap install --classic code
# if `snap` is not available, you can try the following commands instead:
# wget https://go.microsoft.com/fwlink/?LinkID=760868 -O vscode.deb
# sudo apt install ./vscode.deb -y
# rm vscode.deb
run: |
code tunnel --accept-server-license-terms
Note that you’ll be prompted to authenticate with your GitHub account to launch a VS Code tunnel.