Source: llm/verl

Verl: State-of-the-art RL Training for LLMs#

Verl is the most popular open-source reinforcement learning framework for LLMs, supporting PPO, GRPO, and other algorithms.

Also see search-tooling/ and this blog for tool-augmented “search” workflows (Search-R1 style), including Google Search–backed inference and a Wikipedia FAISS retrieval service used for inference and training.

Why SkyPilot + Verl?#

SkyPilot makes RL training easy and cost-effective:

  • Get GPUs instantly across clouds and Kubernetes

  • 3x cheaper with managed spot instances

  • Zero setup - handles distributed Ray clusters automatically

Quick Start#

Launch single node agent training:

sky launch -c verl-ppo llm/verl/verl-ppo.yaml --secret WANDB_API_KEY --num-nodes 1 -y
sky launch -c verl-ppo llm/verl/verl-ppo.yaml --secret WANDB_API_KEY --secret HF_TOKEN --num-nodes 1 -y

sky launch -c verl-grpo llm/verl/verl-grpo.yaml --secret WANDB_API_KEY --num-nodes 1 -y
sky launch -c verl-grpo llm/verl/verl-grpo.yaml --secret WANDB_API_KEY --secret HF_TOKEN --num-nodes 1 -y

Launch a 2-node RLHF training job on the cheapest available GPUs:

sky launch -c verl llm/verl/multinode.yaml

Monitor training progress:

sky logs verl

Verl training logs showing reward optimization

Training logs showing PPO optimization progress with reward metrics

Access Ray dashboard:

sky status --endpoint 8280 verl

Ray Dashboard showing distributed RLHF training

Ray dashboard showing real-time monitoring of distributed training across multiple nodes

Learn More#

Included files#