Using Docker Containers#
SkyPilot can run a container either as a task, or as the runtime environment of a cluster.
If the container image is invocable / has an entrypoint: run it as a task.
If the container image is to be used as a runtime environment (e.g.,
ubuntu
,nvcr.io/nvidia/pytorch:23.10-py3
, etc.) and if you have extra commands to run inside the container: run it as a runtime environment.
Note
Running docker containers is not supported on RunPod. To use RunPod, either use your docker image (the username should be root
for RunPod) as a runtime environment or use setup
and run
to configure your environment. See GitHub issue for more.
Running Containers as Tasks#
Note
On Kubernetes, running Docker runtime in a pod is not recommended. Instead, use your container as a runtime environment.
SkyPilot can run containerized applications directly as regular tasks. The default VM images provided by SkyPilot already have the Docker runtime pre-configured.
To launch a containerized application, you can directly invoke docker run
in the run
section of your task.
For example, to run a HuggingFace TGI serving container:
resources:
accelerators: A100:1
run: |
docker run --gpus all --shm-size 1g -v ~/data:/data \
ghcr.io/huggingface/text-generation-inference \
--model-id lmsys/vicuna-13b-v1.5
# NOTE: Uncommon to have any commands after the above.
# `docker run` is blocking, so any commands after it
# will NOT be run inside the container.
Private Registries#
When using this mode, to access Docker images hosted on private registries,
simply add a setup
section to your task YAML file to authenticate with
the registry:
resources:
accelerators: A100:1
setup: |
# Authenticate with private registry
docker login -u <username> -p <password> <registry>
run: |
docker run <registry>/<image>:<tag>
Building containers remotely#
If you are running containerized applications, the container image can also be built remotely on the cluster in the setup
phase of the task.
The echo_app
example provides an example on how to do this:
file_mounts:
/inputs: ./echo_app # Input to application
/echo_app: ./echo_app # Contains the Dockerfile and build context
/outputs: # Output to be written directly to S3 bucket
name: # Set unique bucket name here
store: s3
mode: MOUNT
setup: |
# Build docker image. If pushed to a registry, can also do docker pull here
docker build -t echo:v0 /echo_app
run: |
docker run --rm \
--volume="/inputs:/inputs:ro" \
--volume="/outputs:/outputs:rw" \
echo:v0 \
/inputs/README.md /outputs/output.txt
In this example, the Dockerfile and build context are contained in ./echo_app
.
The setup
phase of the task builds the image, and the run
phase runs the container.
The inputs to the app are copied to SkyPilot using file_mounts
and mounted into the container using docker volume mounts (--volume
flag).
The output of the app produced at /outputs
path in the container is also volume mounted to /outputs
on the VM, which gets directly written to a S3 bucket through bucket mounting.
Our GitHub repository has more examples, including running Detectron2 in a Docker container via SkyPilot.
Using Containers as Runtime Environments#
When a container is used as the runtime environment, everything happens inside the container:
The SkyPilot runtime is automatically installed and launched inside the container;
setup
andrun
commands are executed in the container;Any files created by the task will be stored inside the container.
To use a Docker image as your runtime environment, set the image_id
field in the resources
section of your task YAML file to docker:<image_id>
. Only Debian-based images (e.g., Ubuntu) are supported for now.
For example, to use the ubuntu:20.04
image from Docker Hub:
resources:
image_id: docker:ubuntu:20.04
setup: |
# Commands to run inside the container
run: |
# Commands to run inside the container
As another example, here’s how to use NVIDIA’s PyTorch NGC Container:
resources:
image_id: docker:nvcr.io/nvidia/pytorch:23.10-py3
accelerators: T4
setup: |
# Commands to run inside the container
run: |
# Commands to run inside the container
# Since SkyPilot tasks are run inside a fresh conda "(base)" environment,
# deactivate first to access what the Docker image has already installed.
source deactivate
nvidia-smi
python -c 'import torch; print(torch.__version__)'
Any GPUs assigned to the task will be automatically mapped to your Docker container, and all subsequent tasks within the cluster will also run inside the container. In a multi-node scenario, the container will be launched on all nodes, and the corresponding node’s container will be assigned for task execution.
Tip
When to use this?
If you have a preconfigured development environment set up within a Docker image, it can be convenient to use the runtime environment mode. This is especially useful for launching development environments that are challenging to configure on a new virtual machine, such as dependencies on specific versions of CUDA or cuDNN.
Note
Since we pip install skypilot
inside the user-specified container image
as part of a launch, users should ensure dependency conflicts do not occur.
Currently, the following requirements must be met:
The container image should be based on Debian;
The container image must grant sudo permissions without requiring password authentication for the user. Having a root user is also acceptable.
Note
Using a container with a customized entrypoint as a runtime environment is
supported, with the container’s entrypoint being overridden by /bin/bash
.
Specific commands can be executed in the setup
and run
sections
of the task YAML file. However, this approach is not compatible with RunPod due
to limitations in the RunPod API, so ensure that you choose a container with a
default entrypoint (i.e. /bin/bash
).
Private Registries#
Note
These instructions do not apply if you use SkyPilot to launch on Kubernetes clusters. Instead, see Using Images from Private Repositories in Kubernetes for more.
When using this mode, to access Docker images hosted on private registries, you can provide the registry authentication details using task environment variables:
resources:
image_id: docker:<user>/<your-docker-hub-repo>:<tag>
envs:
# Values used in: docker login -u <user> -p <password> <registry server>
SKYPILOT_DOCKER_USERNAME: <user>
SKYPILOT_DOCKER_PASSWORD: <password>
SKYPILOT_DOCKER_SERVER: docker.io
resources:
image_id: docker:<your-ecr-repo>:<tag>
envs:
# Values used in: docker login -u <user> -p <password> <registry server>
SKYPILOT_DOCKER_USERNAME: AWS
SKYPILOT_DOCKER_PASSWORD: <password>
SKYPILOT_DOCKER_SERVER: <your-user-id>.dkr.ecr.<region>.amazonaws.com
We suggest setting the SKYPILOT_DOCKER_PASSWORD
environment variable through the CLI (see passing secrets):
$ # Docker Hub password:
$ export SKYPILOT_DOCKER_PASSWORD=...
$ # Or cloud registry password:
$ export SKYPILOT_DOCKER_PASSWORD=$(aws ecr get-login-password --region us-east-1)
$ # Pass --env:
$ sky launch task.yaml --env SKYPILOT_DOCKER_PASSWORD