Source: llm/nemorl

NeMo RL: Scalable Reinforcement Learning for LLMs#

NeMo RL is NVIDIA’s scalable and efficient post-training library for reinforcement learning with large language models, supporting models from 1B to over 100B parameters with distributed training capabilities.

Why SkyPilot + NeMo RL?#

SkyPilot makes RL training with NeMo effortless:

Multi-node with zero setup - handles distributed Ray clusters automatically
Run on Kubernetes - supports both managed Kubernetes clusters and your own Kubernetes clusters
Out of the box InfiniBand support - automatically enables InfiniBand support on supported clouds and Kubernetes

Quick Start#

First, set up your Hugging Face token (see Preparation section for details):

export HF_TOKEN=your_hf_token_here

Launch a 2-node DPO training job on H100s:

sky launch -c nemorl llm/nemorl/nemorl.sky.yaml --secret HF_TOKEN

Monitor training progress:

sky logs nemorl

The example runs Direct Preference Optimization (DPO) training on 2 nodes with 8x H100 GPUs each.

Advanced Usage#

📈 Scale to More Nodes#

sky launch -c nemorl llm/nemorl/nemorl.sky.yaml --num-nodes 4 --secret HF_TOKEN

🔧 Customize Training Configuration#

Modify DPO parameters directly in the YAML:

# Edit the run command in nemorl.sky.yaml to adjust:
# - dpo.val_global_batch_size: Validation batch size
# - checkpointing.checkpoint_dir: Output directory
# - cluster.gpus_per_node: GPU configuration

Train with different examples from NeMo RL repository:

# Modify the run command to use different NeMo RL examples:
# - examples/run_grpo.py: Group Relative Policy Optimization
# - examples/run_sft.py: Supervised Fine-Tuning

Preparation#

Before running NeMo RL training, you need to set up model access and authentication:

1. Request Model Access#

Some models used in NeMo RL examples may require access approval. For example:

Request access to Llama models on Hugging Face if using Llama-based examples
Follow the model-specific access requirements for other models

2. Get Your Hugging Face Token#

Go to Hugging Face Settings > Tokens
Create a new token with “Read” permissions
Copy the token for use in the next step

3. Set Environment Variable#

Add your Hugging Face token to your environment:

export HF_TOKEN="your_token_here"

4. Install SkyPilot#

Install SkyPilot with your preferred cloud providers:

pip install skypilot-nightly[aws,gcp,kubernetes] 
# See: https://docs.skypilot.co/en/latest/getting-started/installation.html

5. Verify Setup#

Check your infrastructure setup:

sky check

Learn More#

Included files#

nemorl.sky.yaml

# NeMo RL multi-node on Kubernetes with SkyPilot
#
# Runs the DPO example from NeMo RL repository on 16 H100s (with InfiniBand on supported clusters)
#
# Usage:
#   HF_TOKEN=<YOUR_TOKEN> sky exec -c nemo nemorl.sky.yaml --secret HF_TOKEN

resources:
  accelerators: H100:8
  memory: 64+
  infra: k8s
  cpus: 32+
  # network_tier: best # Uncomment this for InfiniBand support on supported clusters
  image_id: docker:nvidia/cuda:12.8.0-devel-ubuntu24.04

num_nodes: 2

secrets:
  HF_TOKEN: null

setup: |
  sudo apt update
  # Install libibverbs-dev for InfiniBand support
  sudo apt install -y git libibverbs-dev python3-dev

  # Set up NeMo RL
  git clone https://github.com/NVIDIA-NeMo/RL nemo-rl
  cd nemo-rl
  git checkout ee8f5aa75a7c8ab070a460f49b5fcf226c5b3018
  git submodule update --init --recursive
  uv venv
  # Build flash-attn and warm the uv cache before first run
  bash tools/build-flash-attn-in-uv-cache.sh
  cd ..

run: |
  set -e
  sudo chmod 777 -R /var/tmp
  cd nemo-rl
  head_ip=`echo "$SKYPILOT_NODE_IPS" | head -n1`
  num_nodes=`echo "$SKYPILOT_NODE_IPS" | wc -l`
  if [ "$SKYPILOT_NODE_RANK" == "0" ]; then
    ps aux | grep ray | grep 6379 &> /dev/null || uv run ray start --head --disable-usage-stats --port 6379
    sleep 5
    uv run ray status
    uv run python examples/run_dpo.py  \
      cluster.gpus_per_node=${SKYPILOT_NUM_GPUS_PER_NODE} \
      cluster.num_nodes=${SKYPILOT_NUM_NODES} \
      dpo.val_global_batch_size=32 \
      checkpointing.checkpoint_dir='results/dpo_llama81'
  else
    sleep 5
    ps aux | grep ray | grep 6379 &> /dev/null || uv run ray start --address $head_ip:6379 --disable-usage-stats
    # Add sleep to after `ray start` to give ray enough time to daemonize
    sleep 5
  fi