Using Docker Containers#

SkyPilot can run a container either as a task, or as the runtime environment of a cluster.

  • If the container image is invocable / has an entrypoint: run it as a task.

  • If the container image is to be used as a runtime environment (e.g., ubuntu, nvcr.io/nvidia/pytorch:23.10-py3, etc.) and if you have extra commands to run inside the container: run it as a runtime environment.

Running Containers as Tasks#

SkyPilot can run containerized applications directly as regular tasks. The default VM images provided by SkyPilot already have the Docker runtime pre-configured.

To launch a containerized application, you can directly invoke docker run in the run section of your task.

For example, to run a HuggingFace TGI serving container:

resources:
  accelerators: A100:1

run: |
  docker run --gpus all --shm-size 1g -v ~/data:/data \
    ghcr.io/huggingface/text-generation-inference \
    --model-id lmsys/vicuna-13b-v1.5

  # NOTE: Uncommon to have any commands after the above.
  # `docker run` is blocking, so any commands after it
  # will NOT be run inside the container.

Private Registries#

When using this mode, to access Docker images hosted on private registries, simply add a setup section to your task YAML file to authenticate with the registry:

resources:
  accelerators: A100:1

setup: |
  # Authenticate with private registry
  docker login -u <username> -p <password> <registry>

run: |
  docker run <registry>/<image>:<tag>

Building containers remotely#

If you are running containerized applications, the container image can also be built remotely on the cluster in the setup phase of the task.

The echo_app example provides an example on how to do this:

file_mounts:
  /inputs: ./echo_app  # Input to application
  /echo_app: ./echo_app  # Contains the Dockerfile and build context
  /outputs:  # Output to be written directly to S3 bucket
    name: # Set unique bucket name here
    store: s3
    mode: MOUNT

setup: |
  # Build docker image. If pushed to a registry, can also do docker pull here
  docker build -t echo:v0 /echo_app

run: |
  docker run --rm \
    --volume="/inputs:/inputs:ro" \
    --volume="/outputs:/outputs:rw" \
    echo:v0 \
    /inputs/README.md /outputs/output.txt

In this example, the Dockerfile and build context are contained in ./echo_app. The setup phase of the task builds the image, and the run phase runs the container. The inputs to the app are copied to SkyPilot using file_mounts and mounted into the container using docker volume mounts (--volume flag). The output of the app produced at /outputs path in the container is also volume mounted to /outputs on the VM, which gets directly written to a S3 bucket through bucket mounting.

Our GitHub repository has more examples, including running Detectron2 in a Docker container via SkyPilot.

Using Containers as Runtime Environments#

When a container is used as the runtime environment, everything happens inside the container:

  • The SkyPilot runtime is automatically installed and launched inside the container;

  • setup and run commands are executed in the container;

  • Any files created by the task will be stored inside the container.

To use a Docker image as your runtime environment, set the image_id field in the resources section of your task YAML file to docker:<image_id>. Only Debian-based images (e.g., Ubuntu) are supported for now.

For example, to use the ubuntu:20.04 image from Docker Hub:

resources:
  image_id: docker:ubuntu:20.04

setup: |
  # Commands to run inside the container

run: |
  # Commands to run inside the container

As another example, here’s how to use NVIDIA’s PyTorch NGC Container:

resources:
  image_id: docker:nvcr.io/nvidia/pytorch:23.10-py3
  accelerators: T4

setup: |
  # Commands to run inside the container

run: |
  # Commands to run inside the container

  # Since SkyPilot tasks are run inside a fresh conda "(base)" environment,
  # deactivate first to access what the Docker image has already installed.
  source deactivate
  nvidia-smi
  python -c 'import torch; print(torch.__version__)'

Any GPUs assigned to the task will be automatically mapped to your Docker container, and all subsequent tasks within the cluster will also run inside the container. In a multi-node scenario, the container will be launched on all nodes, and the corresponding node’s container will be assigned for task execution.

Tip

When to use this?

If you have a preconfigured development environment set up within a Docker image, it can be convenient to use the runtime environment mode. This is especially useful for launching development environments that are challenging to configure on a new virtual machine, such as dependencies on specific versions of CUDA or cuDNN.

Note

Since we pip install skypilot inside the user-specified container image as part of a launch, users should ensure dependency conflicts do not occur.

Currently, the following requirements must be met:

  1. The container image should be based on Debian;

  2. The container image must grant sudo permissions without requiring password authentication for the user. Having a root user is also acceptable.

Private Registries#

When using this mode, to access Docker images hosted on private registries, you can provide the registry authentication details using task environment variables:

# ecr_private_docker.yaml
resources:
  image_id: docker:<your-user-id>.dkr.ecr.us-east-1.amazonaws.com/<your-private-image>:<tag>
  # the following shorthand is also supported:
  # image_id: docker:<your-private-image>:<tag>

envs:
  SKYPILOT_DOCKER_USERNAME: AWS
  # SKYPILOT_DOCKER_PASSWORD: <password>
  SKYPILOT_DOCKER_SERVER: <your-user-id>.dkr.ecr.us-east-1.amazonaws.com

We suggest setting the SKYPILOT_DOCKER_PASSWORD environment variable through the CLI (see passing secrets):

$ export SKYPILOT_DOCKER_PASSWORD=$(aws ecr get-login-password --region us-east-1)
$ sky launch ecr_private_docker.yaml --env SKYPILOT_DOCKER_PASSWORD