Using Docker Containers#
SkyPilot can run a container either as a task, or as the runtime environment of a cluster.
If the container image is invocable / has an entrypoint: run it as a task.
If the container image is to be used as a runtime environment (e.g.,
ubuntu
,nvcr.io/nvidia/pytorch:23.10-py3
, etc.) and if you have extra commands to run inside the container: run it as a runtime environment.
Running Containers as Tasks#
SkyPilot can run containerized applications directly as regular tasks. The default VM images provided by SkyPilot already have the Docker runtime pre-configured.
To launch a containerized application, you can directly invoke docker run
in the run
section of your task.
For example, to run a HuggingFace TGI serving container:
resources:
accelerators: A100:1
run: |
docker run --gpus all --shm-size 1g -v ~/data:/data \
ghcr.io/huggingface/text-generation-inference \
--model-id lmsys/vicuna-13b-v1.5
# NOTE: Uncommon to have any commands after the above.
# `docker run` is blocking, so any commands after it
# will NOT be run inside the container.
Private Registries#
When using this mode, to access Docker images hosted on private registries,
simply add a setup
section to your task YAML file to authenticate with
the registry:
resources:
accelerators: A100:1
setup: |
# Authenticate with private registry
docker login -u <username> -p <password> <registry>
run: |
docker run <registry>/<image>:<tag>
Building containers remotely#
If you are running containerized applications, the container image can also be built remotely on the cluster in the setup
phase of the task.
The echo_app
example provides an example on how to do this:
file_mounts:
/inputs: ./echo_app # Input to application
/echo_app: ./echo_app # Contains the Dockerfile and build context
/outputs: # Output to be written directly to S3 bucket
name: # Set unique bucket name here
store: s3
mode: MOUNT
setup: |
# Build docker image. If pushed to a registry, can also do docker pull here
docker build -t echo:v0 /echo_app
run: |
docker run --rm \
--volume="/inputs:/inputs:ro" \
--volume="/outputs:/outputs:rw" \
echo:v0 \
/inputs/README.md /outputs/output.txt
In this example, the Dockerfile and build context are contained in ./echo_app
.
The setup
phase of the task builds the image, and the run
phase runs the container.
The inputs to the app are copied to SkyPilot using file_mounts
and mounted into the container using docker volume mounts (--volume
flag).
The output of the app produced at /outputs
path in the container is also volume mounted to /outputs
on the VM, which gets directly written to a S3 bucket through bucket mounting.
Our GitHub repository has more examples, including running Detectron2 in a Docker container via SkyPilot.
Using Containers as Runtime Environments#
When a container is used as the runtime environment, everything happens inside the container:
The SkyPilot runtime is automatically installed and launched inside the container;
setup
andrun
commands are executed in the container;Any files created by the task will be stored inside the container.
To use a Docker image as your runtime environment, set the image_id
field in the resources
section of your task YAML file to docker:<image_id>
. Only Debian-based images (e.g., Ubuntu) are supported for now.
For example, to use the ubuntu:20.04
image from Docker Hub:
resources:
image_id: docker:ubuntu:20.04
setup: |
# Commands to run inside the container
run: |
# Commands to run inside the container
As another example, here’s how to use NVIDIA’s PyTorch NGC Container:
resources:
image_id: docker:nvcr.io/nvidia/pytorch:23.10-py3
accelerators: T4
setup: |
# Commands to run inside the container
run: |
# Commands to run inside the container
# Since SkyPilot tasks are run inside a fresh conda "(base)" environment,
# deactivate first to access what the Docker image has already installed.
source deactivate
nvidia-smi
python -c 'import torch; print(torch.__version__)'
Any GPUs assigned to the task will be automatically mapped to your Docker container, and all subsequent tasks within the cluster will also run inside the container. In a multi-node scenario, the container will be launched on all nodes, and the corresponding node’s container will be assigned for task execution.
Tip
When to use this?
If you have a preconfigured development environment set up within a Docker image, it can be convenient to use the runtime environment mode. This is especially useful for launching development environments that are challenging to configure on a new virtual machine, such as dependencies on specific versions of CUDA or cuDNN.
Note
Since we pip install skypilot
inside the user-specified container image
as part of a launch, users should ensure dependency conflicts do not occur.
Currently, the following requirements must be met:
The container image should be based on Debian;
The container image must grant sudo permissions without requiring password authentication for the user. Having a root user is also acceptable.
Private Registries#
When using this mode, to access Docker images hosted on private registries, you can provide the registry authentication details using task environment variables:
# ecr_private_docker.yaml
resources:
image_id: docker:<your-user-id>.dkr.ecr.us-east-1.amazonaws.com/<your-private-image>:<tag>
# the following shorthand is also supported:
# image_id: docker:<your-private-image>:<tag>
envs:
SKYPILOT_DOCKER_USERNAME: AWS
# SKYPILOT_DOCKER_PASSWORD: <password>
SKYPILOT_DOCKER_SERVER: <your-user-id>.dkr.ecr.us-east-1.amazonaws.com
We suggest setting the SKYPILOT_DOCKER_PASSWORD
environment variable through the CLI (see passing secrets):
$ export SKYPILOT_DOCKER_PASSWORD=$(aws ecr get-login-password --region us-east-1)
$ sky launch ecr_private_docker.yaml --env SKYPILOT_DOCKER_PASSWORD