Secrets and Environment Variables#
Environment variables are a powerful way to pass configuration and secrets to your tasks. There are two types of environment variables in SkyPilot:
User-specified environment variables: Passed by users to tasks, useful for secrets and configurations.
SkyPilot environment variables: Predefined by SkyPilot with information about the current cluster and task.
User-specified environment variables#
User-specified environment variables are useful for passing secrets and any arguments or configurations needed for your tasks. They are made available in file_mounts
, setup
, and run
.
You can specify environment variables to be made available to a task in two ways:
envs
field (dict) in a task YAML:envs: MYVAR: val
--env
flag insky launch/exec
CLI (takes precedence over the above)
Tip
To mark an environment variable as required and make SkyPilot forcefully check
its existence (errors out if not specified), set it to an empty string or
null
in the task YAML. For example, WANDB_API_KEY
and HF_TOKEN
in
the following task YAML are marked as required:
envs:
WANDB_API_KEY:
HF_TOKEN: null
MYVAR: val
The file_mounts
, setup
, and run
sections of a task YAML can access the variables via the ${MYVAR}
syntax.
Passing secrets#
We recommend passing secrets to any node(s) executing your task by first making
it available in your current shell, then using --env SECRET
to pass it to SkyPilot:
$ sky launch -c mycluster --env HF_TOKEN --env WANDB_API_KEY task.yaml
$ sky exec mycluster --env WANDB_API_KEY task.yaml
Tip
You do not need to pass the value directly such as --env
WANDB_API_KEY=1234
. When the value is not specified (e.g., --env WANDB_API_KEY
),
SkyPilot reads it from local environment variables.
Using in file_mounts
#
# Sets default values for some variables; can be overridden by --env.
envs:
MY_BUCKET: skypilot-temp-gcs-test
MY_LOCAL_PATH: tmp-workdir
MODEL_SIZE: 13b
file_mounts:
/mydir:
name: ${MY_BUCKET} # Name of the bucket.
mode: MOUNT
/another-dir2:
name: ${MY_BUCKET}-2
source: ["~/${MY_LOCAL_PATH}"]
/checkpoint/${MODEL_SIZE}: ~/${MY_LOCAL_PATH}
The values of these variables are filled in by SkyPilot at task YAML parse time.
Read more at examples/using_file_mounts_with_env_vars.yaml.
Using in setup
and run
#
All user-specified environment variables are exported to a task’s setup
and run
commands (i.e., accessible when they are being run).
For example, this is useful for passing secrets (see below) or passing configurations:
# Sets default values for some variables; can be overridden by --env.
envs:
MODEL_NAME: decapoda-research/llama-65b-hf
run: |
python train.py --model_name ${MODEL_NAME} <other args>
$ sky launch --env MODEL_NAME=decapoda-research/llama-7b-hf task.yaml # Override.
See complete examples at llm/vllm/serve.yaml and llm/vicuna/train.yaml.
SkyPilot environment variables#
SkyPilot exports several predefined environment variables made available during a task’s execution. These variables contain information about the current cluster or task, which can be useful for distributed frameworks such as torch.distributed, OpenMPI, etc. See examples in Distributed Multi-Node Jobs and Managed Jobs.
The values of these variables are filled in by SkyPilot at task execution time. You can access these variables in the following ways:
In the task YAML’s
setup
/run
commands (a Bash script), access them using the${MYVAR}
syntax;In the program(s) launched in
setup
/run
, access them using the language’s standard method (e.g.,os.environ
for Python).
The setup
and run
stages can access different sets of SkyPilot environment variables:
Environment variables for setup
#
Name |
Definition |
Example |
---|---|---|
|
Rank (an integer ID from 0 to |
0 |
|
A string of IP addresses of the nodes in the cluster with the same order as the node ranks, where each line contains one IP address. Note that this is not necessarily the same as the nodes in |
1.2.3.4
3.4.5.6
|
|
Number of nodes in the cluster. Same value as |
2 |
|
A unique ID assigned to each task. This environment variable is available only when the task is submitted
with Refer to the description in the environment variables for run. |
sky-2023-07-06-21-18-31-563597_myclus_1 For managed spot jobs: sky-managed-2023-07-06-21-18-31-563597_my-job-name_1-0 |
|
A JSON string containing information about the cluster. To access the information, you could parse the JSON string in bash import json
json.loads(
os.environ['SKYPILOT_CLUSTER_INFO']
)['cloud']
|
{“cluster_name”: “my-cluster-name”, “cloud”: “GCP”, “region”: “us-central1”, “zone”: “us-central1-a”} |
|
The ID of a replica within the service (starting from 1). Available only for a service’s replica task. |
1 |
Since setup commands always run on all nodes of a cluster, SkyPilot ensures both of these environment variables (the ranks and the IP list) never change across multiple setups on the same cluster.
Environment variables for run
#
Name |
Definition |
Example |
---|---|---|
|
Rank (an integer ID from 0 to |
0 |
|
A string of IP addresses of the nodes reserved to execute the task, where each line contains one IP address. Read more here. |
1.2.3.4
|
|
Number of nodes assigned to execute the current task. Same value as |
1 |
|
Number of GPUs reserved on each node to execute the task; the same as the
count in |
0 |
|
A unique ID assigned to each task in the format “sky-<timestamp>_<cluster-name>_<task-id>”.
Useful for logging purposes: e.g., use a unique output path on the cluster; pass to Weights & Biases; etc.
Each task’s logs are stored on the cluster at If a task is run as a managed spot job, then all
recoveries of that job will have the same ID value. The ID is in the format “sky-managed-<timestamp>_<job-name>(_<task-name>)_<job-id>-<task-id>”, where |
sky-2023-07-06-21-18-31-563597_myclus_1 For managed spot jobs: sky-managed-2023-07-06-21-18-31-563597_my-job-name_1-0 |
|
A JSON string containing information about the cluster. To access the information, you could parse the JSON string in bash import json
json.loads(
os.environ['SKYPILOT_CLUSTER_INFO']
)['cloud']
|
{“cluster_name”: “my-cluster-name”, “cloud”: “GCP”, “region”: “us-central1”, “zone”: “us-central1-a”} |
|
The ID of a replica within the service (starting from 1). Available only for a service’s replica task. |
1 |