Skip to main content
Ctrl+K
You are viewing the latest developer preview docs. Click here to view docs for the latest stable release.

Site Navigation

  • Docs
  • Case Studies
  • Blog
  • Slack
  • Twitter
  • GitHub

Site Navigation

  • Docs
  • Case Studies
  • Blog
  • Slack
  • Twitter
  • GitHub

Getting Started

  • Overview
  • Installation
  • Quickstart
  • Agent Skills
  • Examples
    • Quickstart: PyTorch
    • Agents
      • Parallel Autoresearch
      • Autonomous Code Optimization
      • GPU Job Management for Agents
    • Training
      • Axolotl
      • DeepSpeed
      • Distributed PyTorch
      • Distributed TensorFlow
      • Fairseq2
      • Finetuning GPT-OSS
      • Finetuning Llama 4
      • Finetuning Llama 3
      • Finetuning Llama 2
      • nanochat
      • NeMo
      • NeMo RL
      • OpenRLHF
      • PyTorch Monarch
      • Ray
      • TorchTitan
      • Training on TPUs
      • Unsloth
      • Verl (RLHF)
      • SkyRL
      • Vertex AI
    • Serving
      • vLLM
      • SGLang
      • Nvidia Dynamo
      • Ollama
      • Hugging Face TGI
      • LoRAX
      • Cog
    • Models
      • OpenAI gpt-oss
      • DeepSeek-R1
      • DeepSeek-R1 Distilled
      • DeepSeek-Janus
      • Gemma 3
      • Llama 4
      • Llama 3.2
      • Llama 3.1
      • Llama 3
      • Llama 2
      • CodeLlama
      • Pixtral
      • Mixtral
      • Mistral 7B
      • Qwen 3
      • Kimi K2
      • Kimi K2 Thinking
      • Yi
      • Gemma
      • DBRX
      • GPT-2 via llm.c
      • Vicuna
    • AI Applications
      • DeepSeek-R1 for RAG
      • DeepSeek OCR with Pools
      • Large-Scale Batch Inference
      • Batch Inference with vLLM
      • Image Vector Database
      • RedisVL Vector Search
      • SAM3 Video Segmentation
      • Streamlit Web Apps
      • Tabby: Coding Assistant
      • LocalGPT: Chat with PDF
      • Stable Diffusion
    • AI Performance
      • AWS EFA
      • GCP/GKE GPUDirect
      • Coreweave with InfiniBand
      • Nebius with InfiniBand
      • Together AI with InfiniBand
    • Orchestrators
      • Airflow
      • Cron
      • Github Actions
      • Prefect
      • Temporal
    • Other Frameworks
      • Cross-cloud data transfer
      • DVC
      • Jupyter
      • marimo
      • MLFlow
      • MPI
      • Spyder IDE
  • Concept: Sky Computing
  • For Frontier AI

Clusters

  • Start a Development Cluster
  • Cluster Jobs
  • Provisioning Compute
  • Autostop and Autodown

Jobs

  • Managed Jobs
  • Checkpointing and Recovery
  • Multi-Node Jobs
  • Many Parallel Jobs
  • Model Training Guide
  • Using a Pool of Workers
  • Batch Inference
    • Custom I/O Formats
  • Job Groups for RL

Model Serving

  • Getting Started
  • Serving User Guides
    • Autoscaling
    • Updating a Service
    • Authorization
    • Using Spot Instances for Serving
    • HTTPS Encryption
    • High Availability Controller

Infra Choices

  • Using Kubernetes
    • Getting Started
    • Kubernetes Cluster Setup
      • Deployment Guides
      • Exposing Services
    • Priority and Preemption
    • Multiple Kubernetes Clusters
    • Configuring Pricing
    • SkyPilot vs. Vanilla Kubernetes
    • Examples
      • Kueue
      • Use Docker in Pods
      • Dynamic Workload Scheduler
      • Kueue with GKE DWS
      • Multi-region Kubernetes
    • Kubernetes Troubleshooting
  • Using Slurm
    • Getting Started
  • Using Existing Machines
  • Using Reservations
  • Using Cloud VMs
    • Requesting Quota Increase
  • GPUs and Accelerators
    • Using Google TPUs
    • Using AMD GPUs

Data

  • Cloud Buckets
  • Volumes
  • Syncing Code, Git, and Files

User Guides

  • SkyPilot Recipes
  • Migrating from Slurm
  • External Links
  • Asynchronous Execution
  • Environment Variables and Secrets
  • Docker Containers
  • Opening Ports
  • Lifecycle hooks
  • Usage Collection
  • Frequently Asked Questions

Administrator Guides

  • API Server Deployment
    • Deploying API Server
      • API server metrics monitoring
      • GPU metrics monitoring
      • Example: Deploy on GKE, GCP, and Nebius with Okta
      • Example: Deploy SkyPilot API Server in Docker
      • Example: Deploy on GKE with Cloud SQL
    • Upgrades and High Availability
    • Performance Best Practices
    • Troubleshooting
    • Helm Chart Reference
  • Authentication and RBAC
  • Workspaces: Isolating Teams
  • Cloud Accounts and Permissions
    • AWS
      • Using IAM Roles for S3 Access on EKS
    • GCP
    • Nebius
    • vSphere
    • Kubernetes
  • Admin Policies
  • External Logging Storage
  • Airgapped Environments

References

  • SkyPilot YAML
  • CLI
  • Python SDK
  • Advanced Configuration
    • Configuration Sources
  • SkyPilot Internals
  • Developer Guides
    • Contributing to SkyPilot
    • Guide: Adding a New Cloud

External Links#

External links are URLs associated with managed jobs and clusters that are displayed in the SkyPilot dashboard. This is useful for linking to external dashboards, experiment trackers, or any other relevant resources.

SkyPilot automatically detects and displays three types of links:

  1. Instance links: For jobs running on AWS, GCP, or Azure, SkyPilot automatically adds links to the cloud console for the underlying instance.

  2. Log-detected links (built-in): The dashboard automatically parses job logs to detect URLs from supported services (currently Weights & Biases) and displays them as external links.

  3. Admin-configured custom URLs: Administrators can register a list of labeled regex patterns in the SkyPilot config. Matching URLs that appear in job logs (job detail page) or cluster provision logs (cluster detail page) are rendered as clickable, labeled links.

Managed jobs external links

Supported services#

SkyPilot automatically detects URLs from the following services in your job logs:

  • Weights & Biases (W&B): Run URLs on W&B SaaS (e.g., https://wandb.ai/<entity>/<project>/runs/<run_id>) and W&B Dedicated Cloud tenants (e.g., https://<tenant>.wandb.io/<entity>/<project>/runs/<run_id>)

When your job prints a URL from a supported service to stdout or stderr, the dashboard will automatically extract it and display it in the “External Links” section.

Example: Using Weights & Biases#

When using W&B for experiment tracking, the W&B library automatically prints the run URL to stdout when you initialize a run. SkyPilot detects this and displays it in the dashboard.

Here’s an example training job:

# wandb_training.yaml
name: wandb-training

envs:
  WANDB_API_KEY: null # Set via --secret

setup: |
  pip install wandb torch

run: |
  python train.py
# train.py
import wandb
run = wandb.init(project='example', name='demo-run')
run.log({'loss': 1.0})
run.finish()

Launch the job:

$ sky jobs launch -n wandb-example-job --env WANDB_API_KEY=$WANDB_API_KEY wandb_training.yaml

Once the job starts and W&B prints the run URL to the logs, you’ll see the link appear in the dashboard:

Job detail page showing W&B external link

Clicking the link will take you directly to the W&B run page allowing you to quickly view the run metrics and artifacts.

W&B run page

Admin-configured custom URLs#

Administrators can extend the built-in W&B detection with their own {label, regex} entries in the SkyPilot server config. Any URL printed to the logs that matches a configured regex is rendered as a clickable, labeled link on both the cluster and job detail pages.

Add the entries under a top-level dashboard block in ~/.sky/config.yaml on the SkyPilot API server:

dashboard:
  external_links:
    - label: "Grafana"
      regex: 'https://grafana\.internal\.example\.com/d/[a-z0-9]+.*'
    - label: "Internal tools"
      regex: 'https://tools\.internal\.example\.com/.*'

Each entry takes:

  • label: The text shown to users in the External Links section.

  • regex: A Python-style regex matched against whitespace-delimited tokens in streamed log output. Each pattern resolves to at most one URL per cluster or job (the first match wins).

After updating the config, restart the API server (sky api stop && sky api start) so the new entries are loaded.

Custom URL matches appear on:

  • The job detail page under External Links, alongside W&B and instance console links. Scanning happens as job logs stream into the browser.

  • The cluster detail page under External Links. The dashboard automatically streams the tail of the most-recent job’s logs to scan for matches; if you also expand the Provision Logs section, those lines are scanned as well.

Regexes that fail to compile are rejected at config load time with a clear error, so a malformed entry does not silently disable other entries.

previous

Migrating from Slurm to SkyPilot

next

Asynchronous Execution

On this page
  • Supported services
    • Example: Using Weights & Biases
    • Admin-configured custom URLs
Edit on GitHub

© Copyright 2026, SkyPilot Team.