Source: llm/verl

Verl: State-of-the-art RL Training for LLMs#

Verl is the most popular open-source reinforcement learning framework for LLMs, supporting PPO, GRPO, and other algorithms.

Why SkyPilot + Verl?#

SkyPilot makes RL training easy and cost-effective:

  • Get GPUs instantly across clouds and Kubernetes

  • 3x cheaper with managed spot instances

  • Zero setup - handles distributed Ray clusters automatically

Quick Start#

Launch a 2-node RLHF training job on the cheapest available GPUs:

sky launch -c verl llm/verl/multinode.yaml

Monitor training progress:

sky logs verl

Verl training logs showing reward optimization

Training logs showing PPO optimization progress with reward metrics

Access Ray dashboard:

sky status --endpoint 8280 verl

Ray Dashboard showing distributed RLHF training

Ray dashboard showing real-time monitoring of distributed training across multiple nodes

Key Features#

The example trains Qwen2.5-0.5B-Instruct on the GSM8K dataset using PPO:

  • Multi-node distributed training with automatic Ray cluster setup

  • Checkpoint persistence to cloud storage for fault tolerance

  • Customizable models and datasets via environment variables

Optional: Enable W&B for Training Visualization#

To track training curves and metrics in Weights & Biases:

# 1. Set your W&B API key locally
export WANDB_API_KEY=your-api-key

# 2. Launch with the secret flag
sky launch -c verl llm/verl/multinode.yaml --secret WANDB_API_KEY

# 3. Edit multinode.yaml to enable W&B logger (see comments in the file)

Advanced Usage#

💰 Use Spot Instances for 3x Cost Savings#

sky jobs launch -n verl-job llm/verl/multinode.yaml

Training automatically resumes from checkpoints if preempted.

🚀 Continue Experiments on the Same Cluster#

# Run additional training epochs
sky exec verl llm/verl/multinode.yaml --env TOTAL_EPOCHS=10

# The YAML automatically detects and reuses the existing Ray cluster

📈 Scale to More Nodes#

sky launch -c verl llm/verl/multinode.yaml --num-nodes 4

🔧 Customize Training Configuration#

Modify parameters directly:

sky launch -c verl llm/verl/multinode.yaml \
  --env MODEL_NAME=meta-llama/Llama-2-7b-hf \
  --env ACTOR_LR=5e-6 \
  --env CRITIC_LR=1e-5

Train a larger model:

sky launch -c verl llm/verl/multinode.yaml \
  --env MODEL_NAME=Qwen/Qwen2.5-7B-Instruct \
  --gpus A100-80GB:8 --num-nodes 4

Understanding the Setup#

  1. Head node: Prepares data, starts Ray head, submits training job

  2. Worker nodes: Join Ray cluster for distributed training

  3. Smart resumption: Ray cluster is reused if already running, avoiding restart overhead

Troubleshooting#

  • OOM errors: Reduce batch sizes or gpu_memory_utilization

  • Connection issues: Ensure ports 6385 (Ray) and 8280 (dashboard) are not blocked

  • First run is slow: Model download happens once, subsequent runs are faster

Learn More#

Included files#