Priority and Preemption#

SkyPilot supports priority-based scheduling, preemption, and re-queuing of jobs running on Kubernetes. You can achieve this by leveraging Kubernetes’ native priority classes.

Tip

Jobs with priorities and preemption are only supported on Kubernetes.

To set job priorities:

  1. Create priority classes in your Kubernetes cluster.

  2. Set the priority classes in your SkyPilot jobs by setting experimental.config_overrides.kubernetes.pod_config.spec.priorityClassName.

  3. Use sky jobs launch to launch your jobs.

With this setup, you can run high priority jobs that preempt low priority jobs when resources are constrained.

Working example#

Below we show an example run with two priority classes: high-priority and low-priority.

Step 1: Create priority classes#

Create two priority classes in your Kubernetes cluster:

# priorities.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 200
globalDefault: false
description: "High priority class for critical jobs"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 100
globalDefault: true
description: "Low priority class for background jobs"

A higher value indicates higher priority. You can create as many priority classes as you want.

Apply these priority classes to your cluster:

$ kubectl apply -f priorities.yaml

Step 2: Setting priorities in SkyPilot jobs#

To assign priorities to your SkyPilot jobs, use the experimental.config_overrides.kubernetes.pod_config field in your SkyPilot YAML.

We use two simple counter jobs in this example:

# high-priority-job.yaml
resources:
  cloud: kubernetes
  cpus: 4

run: |
  python -c '
  import time
  for i in range(1000):
      print(f"High priority counter: {i}")
      time.sleep(1)
  '

experimental:
  config_overrides:
    kubernetes:
      pod_config:
        spec:
          priorityClassName: high-priority
# low-priority-job.yaml
resources:
  cloud: kubernetes
  cpus: 4

run: |
  python -c '
  import time
  for i in range(1000):
      print(f"Low priority counter: {i}")
      time.sleep(1)
  '

experimental:
  config_overrides:
    kubernetes:
      pod_config:
        spec:
          priorityClassName: low-priority

Tip

To see the preemption behavior, be sure to set the resources.cpu field such that once one job is running, there are no CPUs left for the other job in the cluster.

You can inspect the total number of CPUs in the cluster using kubectl get nodes.

Step 3: Launch your jobs#

Use sky jobs launch to launch your jobs as managed jobs. First, we launch the low priority job:

$ sky jobs launch low-priority-job.yaml

Then launch the high priority job:

$ sky jobs launch high-priority-job.yaml

Use sky jobs queue to see the status of your jobs. You will see that the high priority job starts running immediately and the low priority job is preempted.

The low priority job will be in RECOVERING state. SkyPilot will automatically restart the low priority job when resources become available.

$ sky jobs queue
Fetching managed job statuses...
Managed jobs
In progress tasks: 1 RECOVERING, 1 RUNNING
ID  NAME             RESOURCES  SUBMITTED   TOT. DURATION  #RECOVERIES  STATUS
2   sky-0232-romilb  1x[CPU:4]  5 mins ago  5m 35s         0            RUNNING
1   sky-0d6f-romilb  1x[CPU:4]  7 mins ago  7m 13s         1            RECOVERING

Once the high priority job finishes, the low priority job will start running again.

$ sky jobs queue
Fetching managed job statuses...
Managed jobs
No in-progress managed jobs.
ID  NAME             RESOURCES  SUBMITTED    TOT. DURATION  #RECOVERIES  STATUS
2   sky-0232-romilb  1x[CPU:4]  23 mins ago  17m 22s        0            SUCCEEDED
1   sky-0d6f-romilb  1x[CPU:4]  25 mins ago  23m 47s        1            RUNNING

How priorities and preemptions work#

When the cluster does not have enough resources to run all jobs, high priority jobs will preempt low priority jobs. This means pods of low priority jobs will be terminated to create space for high priority jobs.

Preempted jobs will be automatically rescheduled by SkyPilot when resources become available again. You can set up checkpointing and recovery in your code to reduce wasted work.

Jobs with the same priority level follow SkyPilot’s default scheduling behavior.

Tip

You can also apply priority classes to unmanaged SkyPilot clusters. However, when unmanaged clusters are preempted, they will not be automatically restarted.

Limitations#

  1. Priority settings only apply within a Kubernetes cluster.

  2. Preemption behavior depends on your cluster’s configuration and may preempt other pods in the cluster.

For more information, refer to the Kubernetes documentation on Pod Priority and Preemption.