Using Docker Containers#

SkyPilot can run a container either as a task, or as the runtime environment of a cluster.

  • If the container image is invocable / has an entrypoint: run it as a task.

  • If the container image is to be used as a runtime environment (e.g., ubuntu, nvcr.io/nvidia/pytorch:23.10-py3, etc.) and if you have extra commands to run inside the container: run it as a runtime environment.

Note

Running docker containers is not supported on RunPod. To use RunPod, either use your docker image (the username should be root for RunPod) as a runtime environment or use setup and run to configure your environment. See GitHub issue for more.

Running Containers as Tasks#

Note

On Kubernetes, running Docker runtime in a pod is not recommended. Instead, use your container as a runtime environment.

SkyPilot can run containerized applications directly as regular tasks. The default VM images provided by SkyPilot already have the Docker runtime pre-configured.

To launch a containerized application, you can directly invoke docker run in the run section of your task.

For example, to run a HuggingFace TGI serving container:

resources:
  accelerators: A100:1

run: |
  docker run --gpus all --shm-size 1g -v ~/data:/data \
    ghcr.io/huggingface/text-generation-inference \
    --model-id lmsys/vicuna-13b-v1.5

  # NOTE: Uncommon to have any commands after the above.
  # `docker run` is blocking, so any commands after it
  # will NOT be run inside the container.

Private Registries#

When using this mode, to access Docker images hosted on private registries, simply add a setup section to your task YAML file to authenticate with the registry:

resources:
  accelerators: A100:1

setup: |
  # Authenticate with private registry
  docker login -u <username> -p <password> <registry>

run: |
  docker run <registry>/<image>:<tag>

Building containers remotely#

If you are running containerized applications, the container image can also be built remotely on the cluster in the setup phase of the task.

The echo_app example provides an example on how to do this:

file_mounts:
  /inputs: ./echo_app  # Input to application
  /echo_app: ./echo_app  # Contains the Dockerfile and build context
  /outputs:  # Output to be written directly to S3 bucket
    name: # Set unique bucket name here
    store: s3
    mode: MOUNT

setup: |
  # Build docker image. If pushed to a registry, can also do docker pull here
  docker build -t echo:v0 /echo_app

run: |
  docker run --rm \
    --volume="/inputs:/inputs:ro" \
    --volume="/outputs:/outputs:rw" \
    echo:v0 \
    /inputs/README.md /outputs/output.txt

In this example, the Dockerfile and build context are contained in ./echo_app. The setup phase of the task builds the image, and the run phase runs the container. The inputs to the app are copied to SkyPilot using file_mounts and mounted into the container using docker volume mounts (--volume flag). The output of the app produced at /outputs path in the container is also volume mounted to /outputs on the VM, which gets directly written to a S3 bucket through bucket mounting.

Our GitHub repository has more examples, including running Detectron2 in a Docker container via SkyPilot.

Using Containers as Runtime Environments#

When a container is used as the runtime environment, everything happens inside the container:

  • The SkyPilot runtime is automatically installed and launched inside the container;

  • setup and run commands are executed in the container;

  • Any files created by the task will be stored inside the container.

To use a Docker image as your runtime environment, set the image_id field in the resources section of your task YAML file to docker:<image_id>. Only Debian-based images (e.g., Ubuntu) are supported for now.

For example, to use the ubuntu:20.04 image from Docker Hub:

resources:
  image_id: docker:ubuntu:20.04

setup: |
  # Commands to run inside the container

run: |
  # Commands to run inside the container

As another example, here’s how to use NVIDIA’s PyTorch NGC Container:

resources:
  image_id: docker:nvcr.io/nvidia/pytorch:23.10-py3
  accelerators: T4

setup: |
  # Commands to run inside the container

run: |
  # Commands to run inside the container

  # Since SkyPilot tasks are run inside a fresh conda "(base)" environment,
  # deactivate first to access what the Docker image has already installed.
  source deactivate
  nvidia-smi
  python -c 'import torch; print(torch.__version__)'

Any GPUs assigned to the task will be automatically mapped to your Docker container, and all subsequent tasks within the cluster will also run inside the container. In a multi-node scenario, the container will be launched on all nodes, and the corresponding node’s container will be assigned for task execution.

Tip

When to use this?

If you have a preconfigured development environment set up within a Docker image, it can be convenient to use the runtime environment mode. This is especially useful for launching development environments that are challenging to configure on a new virtual machine, such as dependencies on specific versions of CUDA or cuDNN.

Note

Since we pip install skypilot inside the user-specified container image as part of a launch, users should ensure dependency conflicts do not occur.

Currently, the following requirements must be met:

  1. The container image should be based on Debian;

  2. The container image must grant sudo permissions without requiring password authentication for the user. Having a root user is also acceptable.

Note

Using a container with a customized entrypoint as a runtime environment is supported, with the container’s entrypoint being overridden by /bin/bash. Specific commands can be executed in the setup and run sections of the task YAML file. However, this approach is not compatible with RunPod due to limitations in the RunPod API, so ensure that you choose a container with a default entrypoint (i.e. /bin/bash).

Private Registries#

Note

These instructions do not apply if you use SkyPilot to launch on Kubernetes clusters. Instead, see Using Images from Private Repositories in Kubernetes for more.

When using this mode, to access Docker images hosted on private registries, you can provide the registry authentication details using task environment variables:

resources:
  image_id: docker:<user>/<your-docker-hub-repo>:<tag>

envs:
  # Values used in: docker login -u <user> -p <password> <registry server>
  SKYPILOT_DOCKER_USERNAME: <user>
  SKYPILOT_DOCKER_PASSWORD: <password>
  SKYPILOT_DOCKER_SERVER: docker.io
resources:
  image_id: docker:<your-ecr-repo>:<tag>

envs:
  # Values used in: docker login -u <user> -p <password> <registry server>
  SKYPILOT_DOCKER_USERNAME: AWS
  SKYPILOT_DOCKER_PASSWORD: <password>
  SKYPILOT_DOCKER_SERVER: <your-user-id>.dkr.ecr.<region>.amazonaws.com

We suggest setting the SKYPILOT_DOCKER_PASSWORD environment variable through the CLI (see passing secrets):

$ # Docker Hub password:
$ export SKYPILOT_DOCKER_PASSWORD=...
$ # Or cloud registry password:
$ export SKYPILOT_DOCKER_PASSWORD=$(aws ecr get-login-password --region us-east-1)
$ # Pass --env:
$ sky launch task.yaml --env SKYPILOT_DOCKER_PASSWORD