Source: llm/nemorl
NeMo RL: Scalable Reinforcement Learning for LLMs#
NeMo RL is NVIDIA’s scalable and efficient post-training library for reinforcement learning with large language models, supporting models from 1B to over 100B parameters with distributed training capabilities.
Why SkyPilot + NeMo RL?#
SkyPilot makes RL training with NeMo effortless:
Multi-node with zero setup - handles distributed Ray clusters automatically
Run on Kubernetes - supports both managed Kubernetes clusters and your own Kubernetes clusters
Out of the box InfiniBand support - automatically enables InfiniBand support on supported clouds and Kubernetes
Quick Start#
First, set up your Hugging Face token (see Preparation section for details):
export HF_TOKEN=your_hf_token_here
Launch a 2-node DPO training job on H100s:
sky launch -c nemorl llm/nemorl/nemorl.sky.yaml --secret HF_TOKEN
Monitor training progress:
sky logs nemorl
The example runs Direct Preference Optimization (DPO) training on 2 nodes with 8x H100 GPUs each.
Advanced Usage#
📈 Scale to More Nodes#
sky launch -c nemorl llm/nemorl/nemorl.sky.yaml --num-nodes 4 --secret HF_TOKEN
🔧 Customize Training Configuration#
Modify DPO parameters directly in the YAML:
# Edit the run command in nemorl.sky.yaml to adjust:
# - dpo.val_global_batch_size: Validation batch size
# - checkpointing.checkpoint_dir: Output directory
# - cluster.gpus_per_node: GPU configuration
Train with different examples from NeMo RL repository:
# Modify the run command to use different NeMo RL examples:
# - examples/run_grpo.py: Group Relative Policy Optimization
# - examples/run_sft.py: Supervised Fine-Tuning
Preparation#
Before running NeMo RL training, you need to set up model access and authentication:
1. Request Model Access#
Some models used in NeMo RL examples may require access approval. For example:
Request access to Llama models on Hugging Face if using Llama-based examples
Follow the model-specific access requirements for other models
2. Get Your Hugging Face Token#
Create a new token with “Read” permissions
Copy the token for use in the next step
3. Set Environment Variable#
Add your Hugging Face token to your environment:
export HF_TOKEN="your_token_here"
4. Install SkyPilot#
Install SkyPilot with your preferred cloud providers:
pip install skypilot-nightly[aws,gcp,kubernetes]
# See: https://docs.skypilot.co/en/latest/getting-started/installation.html
5. Verify Setup#
Check your infrastructure setup:
sky check
Learn More#
Included files#
nemorl.sky.yaml
# NeMo RL multi-node on Kubernetes with SkyPilot
#
# Runs the DPO example from NeMo RL repository on 16 H100s (with InfiniBand on supported clusters)
#
# Usage:
# HF_TOKEN=<YOUR_TOKEN> sky exec -c nemo nemorl.sky.yaml --secret HF_TOKEN
resources:
accelerators: H100:8
memory: 64+
infra: k8s
cpus: 32+
# network_tier: best # Uncomment this for InfiniBand support on supported clusters
image_id: docker:nvidia/cuda:12.8.0-devel-ubuntu24.04
num_nodes: 2
secrets:
HF_TOKEN: null
setup: |
sudo apt update
# Install libibverbs-dev for InfiniBand support
sudo apt install -y git libibverbs-dev python3-dev
# Set up NeMo RL
git clone https://github.com/NVIDIA-NeMo/RL nemo-rl
cd nemo-rl
git checkout ee8f5aa75a7c8ab070a460f49b5fcf226c5b3018
git submodule update --init --recursive
uv venv
# Build flash-attn and warm the uv cache before first run
bash tools/build-flash-attn-in-uv-cache.sh
cd ..
run: |
sudo chmod 777 -R /var/tmp
cd nemo-rl
head_ip=`echo "$SKYPILOT_NODE_IPS" | head -n1`
num_nodes=`echo "$SKYPILOT_NODE_IPS" | wc -l`
if [ "$SKYPILOT_NODE_RANK" == "0" ]; then
ps aux | grep ray | grep 6379 &> /dev/null || uv run ray start --head --disable-usage-stats --port 6379
sleep 5
uv run ray status
uv run python examples/run_dpo.py \
cluster.gpus_per_node=${SKYPILOT_NUM_GPUS_PER_NODE} \
cluster.num_nodes=${SKYPILOT_NUM_NODES} \
dpo.val_global_batch_size=32 \
checkpointing.checkpoint_dir='results/dpo_llama81'
else
sleep 5
ps aux | grep ray | grep 6379 &> /dev/null || uv run ray start --address $head_ip:6379 --disable-usage-stats
# Add sleep to after `ray start` to give ray enough time to daemonize
sleep 5
fi