Source: llm/skyrl

SkyRL: Modular Full-Stack RL Training for LLMs#

SkyRL is a modular, performant reinforcement learning library for LLMs, designed for real‑world agentic workloads. Its modular design enables users to modify anything - add new environments, easily implement improvements such as asynchronous training, heterogeneous hardware support, and more!

Why SkyPilot + SkyRL?#

SkyPilot makes RL training with SkyRL easy to run and scale with best cost-efficiency:

  • Run on any AI infrastructure, including Kubernetes or clouds

  • Zero setup — one command takes care of provisioning, setting up and run the training.

Quick Start#

Launch a multi‑node GRPO training job on GSM8K using the cheapest available GPUs:

export WANDB_API_KEY="xxx"
sky launch -c skyrl skyrl_train/examples/gsm8k/gsm8k-grpo-skypilot.yaml --secret WANDB_API_KEY

Monitor training progress:

sky logs skyrl

SkyPilot logs

Logs of the training jobs

You can also view the job status in the SkyPilot Dashboard:

sky dashboard

SkyPilot Dashboard

Dashboard showing the status of the training job

If Weights & Biases (W&B) is configured, you can monitor the training run:

W&B training metrics

Key Features#

  • Modular design: plug‑and‑play algorithms, environments, and hardware backends

  • Scales from a single GPU to multi‑node clusters via Ray + SkyPilot

  • Minimal boilerplate: add new environments quickly (often <100 LoC)

Learn More#

Included files#