Finetuning OpenAI gpt-oss Models with SkyPilot#

On August 5, 2025, OpenAI released gpt-oss, including two state-of-the-art open-weight language models: gpt-oss-120b and gpt-oss-20b. These models deliver strong real-world performance at low cost and are available under the flexible Apache 2.0 license.

The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while the gpt-oss-20b model delivers similar results to OpenAI o3-mini.

This guide walks through how to finetune both models with LoRA/full finetuning using πŸ€— Accelerate.

If you’re looking to run inference on gpt-oss models, check out the inference example

Cloud Logos

Step 0: Setup infrastructure#

SkyPilot is a framework for running AI and batch workloads on any infrastructure, offering unified execution, high cost savings, and high GPU availability.

Install SkyPilot#

pip install 'skypilot[all]'

For more details on how to setup your cloud credentials see SkyPilot docs.

Choose your infrastructure#

sky check

Step 1: Run gpt-oss models#

Full finetuning#

For gpt-oss-20b (smaller model):

  • Requirements: 1 node, 8x H100 GPUs

sky launch -c gpt-oss-20b-sft gpt-oss-20b-sft.yaml

For gpt-oss-120b (larger model):

  • Requirements: 4 nodes, 8x H200 GPUs each

sky launch -c gpt-oss-120b-sft gpt-oss-120b-sft.yaml
# gpt-oss-120b-sft.yaml
resources:
  accelerators: H200:8
  network_tier: best

file_mounts:
  /sft: ./sft

num_nodes: 4

setup: |
  conda install cuda -c nvidia
  uv venv ~/training --seed --python 3.10
  source ~/training/bin/activate
  uv pip install torch --index-url https://download.pytorch.org/whl/cu128
  uv pip install "trl>=0.20.0" "peft>=0.17.0" "transformers>=4.55.0"
  uv pip install deepspeed
  uv pip install git+https://github.com/huggingface/accelerate.git@c0a3aefea8aa5008a0fbf55b049bd3f0efa9cbf2

  uv pip install nvitop

run: |
  source ~/training/bin/activate

  MASTER_ADDR=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
  NP=$(($SKYPILOT_NUM_GPUS_PER_NODE * $SKYPILOT_NUM_NODES))

  accelerate launch \
    --config_file /sft/fsdp2_120b.yaml \
    --num_machines $SKYPILOT_NUM_NODES \
    --num_processes $NP \
    --machine_rank $SKYPILOT_NODE_RANK \
    --main_process_ip $MASTER_ADDR \
    --main_process_port 29500 \
    /sft/train.py --model_id openai/gpt-oss-120b

LoRA finetuning#

For gpt-oss-20b with LoRA:

  • Requirements: 1 node, 2x H100 GPU

sky launch -c gpt-oss-20b-lora gpt-oss-20b-lora.yaml

For gpt-oss-120b with LoRA:

  • Requirements: 1 node, 8x H100 GPUs

sky launch -c gpt-oss-120b-lora gpt-oss-120b-lora.yaml

Step 2: Monitor and get results#

Once your finetuning job is running, you can monitor the progress and retrieve results:

# Check job status
sky status

# View logs
sky logs <cluster-name>

# Download results when complete
sky down <cluster-name>

Example full finetuning progress#

Here’s what you can expect to see during training - the loss should decrease and token accuracy should improve over time:

gpt-oss-20b training progress#

Training Progress for gpt-oss-20b on Nebius:
  6%|β–‹         | 1/16 [01:18<19:31, 78.12s/it]
{'loss': 2.2344, 'grad_norm': 17.139, 'learning_rate': 0.0, 'num_tokens': 51486.0, 'mean_token_accuracy': 0.5436, 'epoch': 0.06}

 12%|β–ˆβ–Ž        | 2/16 [01:23<08:10, 35.06s/it]
{'loss': 2.1689, 'grad_norm': 16.724, 'learning_rate': 0.0002, 'num_tokens': 105023.0, 'mean_token_accuracy': 0.5596, 'epoch': 0.12}

 25%|β–ˆβ–ˆβ–Œ       | 4/16 [01:34<03:03, 15.26s/it]
{'loss': 2.1548, 'grad_norm': 3.983, 'learning_rate': 0.000192, 'num_tokens': 214557.0, 'mean_token_accuracy': 0.5182, 'epoch': 0.25}

 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 8/16 [01:56<00:59,  7.43s/it]
{'loss': 2.1323, 'grad_norm': 3.460, 'learning_rate': 0.000138, 'num_tokens': 428975.0, 'mean_token_accuracy': 0.5432, 'epoch': 0.5}

 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 12/16 [02:15<00:21,  5.50s/it]
{'loss': 1.4624, 'grad_norm': 0.888, 'learning_rate': 6.5e-05, 'num_tokens': 641021.0, 'mean_token_accuracy': 0.6522, 'epoch': 0.75}

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16/16 [02:34<00:00,  4.88s/it]
{'loss': 1.1294, 'grad_norm': 0.713, 'learning_rate': 2.2e-05, 'num_tokens': 852192.0, 'mean_token_accuracy': 0.7088, 'epoch': 1.0}

Final Training Summary:
{'train_runtime': 298.36s, 'train_samples_per_second': 3.352, 'train_steps_per_second': 0.054, 'train_loss': 2.086, 'epoch': 1.0}
βœ“ Job finished (status: SUCCEEDED).

Memory and GPU utilization using nvitop

nvitop

gpt-oss-120b training progress#

Training Progress for gpt-oss-120b on 4 nodes:
  3%|▏         | 1/32 [03:45<116:23, 225.28s/it]
  6%|β–‹         | 2/32 [06:12<90:21, 181.05s/it]
  9%|β–‰         | 3/32 [08:45<71:22, 147.67s/it]
 12%|β–ˆβ–Ž        | 4/32 [11:18<59:44, 128.01s/it]
 25%|β–ˆβ–ˆβ–Œ       | 8/32 [22:36<67:48, 169.50s/it]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 14/32 [29:03<43:37, 145.41s/it]

Memory and GPU utilization using nvitop

nvitop

Configuration files#

You can find the complete configurations in the following directory.