Sandboxes#

SkyPilot Sandboxes are fast, isolated compute environments that run on your own Kubernetes clusters. Each sandbox is a lightweight pod you can launch on demand, run commands in, and tear down, without provisioning a full cluster.

Sandboxes are built for AI coding agents, RL training rollouts, and parallel evals: workloads that need many short-lived, isolated environments spun up and down quickly. Pre-warmed pools launch sandboxes in under a second, with volumes and secrets injected automatically. A built-in image for Claude Code ships out of the box.

Tip

Sandboxes are part of SkyPilot Platform, in limited early access. Sign up here; takes 20 seconds.

Sandboxes: launching a Claude Code environment on your own cluster with secrets injected from the SkyPilot Secrets Manager.

Why sandboxes with SkyPilot#

  • Sub-second launches: pre-warmed pools keep idle environments ready, so a sandbox is live in under a second instead of waiting on image pulls and scheduling.

  • Isolated per pod: every sandbox is its own Kubernetes pod with a dedicated image, CPU, and memory, a natural boundary for running agent-generated or otherwise untrusted code without it touching your other workloads.

  • Secrets stay out of your code: credentials are injected at launch from the SkyPilot Secrets Manager as environment variables, so tokens are never baked into images or hardcoded into the commands an agent runs.

  • Massively parallel: launch thousands of sandboxes in one call for RL rollouts and parallel evals, then fan out commands concurrently.

  • Runs on your infra: sandboxes live on your own Kubernetes clusters, so your code and data never leave your environment, and capacity is simply your existing cluster.

Use cases#

  • AI coding agents: give each agent (e.g. Claude Code) its own disposable environment to read, write, and run code in, isolated from your other work and from other agents.

  • RL training rollouts: spin up thousands of sandboxes to run rollouts in parallel, collect results, and tear them down, all from a single process.

  • Parallel evals: run a large eval suite across many isolated environments at once instead of serializing on one machine.

  • Ephemeral build and CI tasks: execute short-lived builds, tests, or scripts in a clean environment without provisioning a full cluster.

Quickstart#

A sandbox is an isolated pod you create, run commands in, and terminate. Once the SkyPilot CLI and the bundled Sandbox SDK are installed, that is the whole loop:

sky sandbox create provisions a sandbox and drops you straight into an interactive shell; exit the shell and the sandbox is destroyed.

# Create a sandbox and drop into a shell (destroyed on exit).
$ sky sandbox create -n dev
✓ Sandbox dev is ready. Connecting via bash...

# Or keep it running with --detach, then manage it by name.
$ sky sandbox create --detach -n dev
$ sky sandbox ls
$ sky sandbox terminate dev
import sky.sandbox

# Create a sandbox from the built-in `default` pool.
sb = sky.sandbox.create(name='dev')

# Run a command (argv tokens, no implicit shell); get
# stdout / stderr / exit_code back.
result = sb.exec('python', '-c', 'print(2 ** 10)')
print(result['stdout'])  # 1024

# Tear it down.
sb.terminate()

Commands are argv tokens run directly in the pod (no implicit shell). For shell features like pipes, globs, or env-var expansion, invoke a shell explicitly: sb.exec('sh', '-c', 'echo $HOME | wc -c').

Or use the context manager to terminate automatically:

import sky.sandbox

with sky.sandbox.create(name='dev') as sb:
    sb.exec('python', 'train.py')
# Sandbox is terminated on exit.

Working with the SDK#

Beyond create / exec / terminate, the sky.sandbox SDK covers batch and async fan-out, and secret / volume injection:

Pass num_sandboxes to create a batch in one call; names are auto-generated from the prefix (rollout-0001, rollout-0002, …).

import sky.sandbox

sandboxes = sky.sandbox.create(name='rollout', num_sandboxes=1000)
for i, sb in enumerate(sandboxes):
    sb.exec('python', 'rollout.py', str(i))

Every entrypoint has an async sibling on a .aio attribute. A single event loop can drive hundreds of concurrent exec calls.

import asyncio
import sky.sandbox

async def main():
    sandboxes = await sky.sandbox.create.aio(
        name='rollout', num_sandboxes=100)
    try:
        results = await asyncio.gather(
            *(sb.exec.aio('python', 'rollout.py', str(i))
              for i, sb in enumerate(sandboxes)))
    finally:
        # Always tear down, even if an exec raises.
        await asyncio.gather(*(sb.terminate.aio() for sb in sandboxes))
        await sky.sandbox.aclose()  # release the shared session

asyncio.run(main())

Inject secrets and mount volumes at create time.

import sky.sandbox

sb = sky.sandbox.create(
    name='job',
    # Inject secrets from the secrets manager as env vars of the
    # same name.
    secrets=['HF_TOKEN'],
    # Plain (non-secret) env vars.
    env={'PROJECT': 'demo'},
    # Mount existing volumes, keyed by mount path (create them with
    # `sky volumes apply`).
    volumes={'/data': 'shared-data'},
)

The SDK exposes create, exec, terminate, ls, create_pool, set_pool_size, and delete_pool. Every per-call entrypoint has an async sibling on a .aio attribute (sky.sandbox.create.aio(...), sb.exec.aio(...)), so the same names work in event-loop code. See the dashboard’s Sandboxes page to manage pools and running sandboxes in the UI.

Example: Running AI coding agents#

The built-in claude pool ships with Claude Code installed. Give an agent its own isolated sandbox, inject its token from the secrets manager, and drive it non-interactively:

import sky.sandbox

with sky.sandbox.create(
    name='claude',
    pool='claude',
    secrets=['CLAUDE_CODE_OAUTH_TOKEN'],
) as sb:
    # Shell features like pipes need an explicit shell.
    result = sb.exec(
        'sh', '-c',
        "echo 'Create a hello world index.html' | "
        'claude -p --dangerously-skip-permissions')
    print(result['stdout'])

Each agent runs in its own pod, so many agents can work in parallel without sharing a filesystem or stepping on each other.

Advanced: Warm pools for fast provisioning#

Without a pool, create provisions a fresh pod on demand, which waits on Kubernetes scheduling and the container image pull. A pool keeps a set of warm, pre-provisioned pods ready, so create instead claims an already-running pod, cutting a single sandbox’s launch time by more than 50%. A pool also fixes the shape of its sandboxes: their container image, CPU, and memory.

SkyPilot ships a built-in default pool (a python image), so the quickstart above needs no setup; create your own when you need a different image or size.

Create a pool, and resize it at any time:

import sky.sandbox

# Create a pool with 10 warm pods kept idle and ready.
sky.sandbox.create_pool(
    name='ml-gpu',
    image='nvcr.io/nvidia/pytorch:24.05-py3',
    cpus=8,
    memory_gb=64,
    replicas=10,
)

# Scale the pool up or down at any time.
sky.sandbox.set_pool_size('ml-gpu', replicas=50)

# Launch a sandbox from the pool.
sb = sky.sandbox.create(name='train', pool='ml-gpu')

See also

  • Volumes: persistent storage you can mount into sandboxes.

  • Job Groups for RL: run many parallel jobs and sandboxes together for RL.

  • SkyPilot for Frontier AI: SkyPilot Platform, including the Secrets Manager that injects credentials into sandboxes.

Tip

Sandboxes are part of SkyPilot Platform, in limited early access. Sign up here; takes 20 seconds.