Using Spot Instances for Serving#
SkyServe supports serving models on a mixture of spot and on-demand replicas with two options: base_ondemand_fallback_replicas
and dynamic_ondemand_fallback
. Currently, SkyServe relies on the user side to retry in the event of spot instance preemptions.
Base on-demand Fallback#
base_ondemand_fallback_replicas
sets the number of on-demand replicas to keep running at all times. This is useful for ensuring service availability and making sure that there is always some capacity available, even if spot replicas are not available. use_spot
should be set to true
to enable spot replicas.
service:
readiness_probe: /health
replica_policy:
min_replicas: 2
max_replicas: 3
target_qps_per_replica: 1
# Ensures that one of the replicas is run on on-demand instances
base_ondemand_fallback_replicas: 1
resources:
ports: 8081
cpus: 2+
use_spot: true
workdir: examples/serve/http_server
run: python3 server.py
Tip
Kubernetes instances are considered on-demand instances. You can use the base_ondemand_fallback_replicas
option to have some replicas run on Kubernetes, while others run on cloud spot instances.
Dynamic on-demand Fallback#
SkyServe supports dynamically fallback to on-demand replicas when spot replicas are not available.
This is enabled by setting dynamic_ondemand_fallback
to be true
.
This is useful for ensuring the required capacity of replicas in the case of spot instance interruptions.
When spot replicas are available, SkyServe will automatically switch back to using spot replicas to maximize cost savings.
service:
readiness_probe: /health
replica_policy:
min_replicas: 2
max_replicas: 3
target_qps_per_replica: 1
# Allows replicas to be run on on-demand instances if spot instances are not available
dynamic_ondemand_fallback: true
resources:
ports: 8081
cpus: 2+
use_spot: true
workdir: examples/serve/http_server
run: python3 server.py
Tip
SkyServe supports specifying both base_ondemand_fallback_replicas
and dynamic_ondemand_fallback
. Specifying both will set a base number of on-demand replicas and dynamically fallback to on-demand replicas when spot replicas are not available.
Example#
The following example demonstrates how to use spot replicas with SkyServe with dynamic fallback. The example is a simple HTTP server that listens on port 8081 with dynamic_ondemand_fallback: true
. To run:
$ sky serve up examples/serve/spot_policy/dynamic_on_demand_fallback.yaml -n http-server
When the service is up, we can check the status of the service and the replicas using the following command. Initially, we will see:
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 1m 17s NO_REPLICA 0/4 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 1 1 - 1 min ago 1x GCP([Spot]vCPU=2) PROVISIONING us-east1
http-server 2 1 - 1 min ago 1x GCP([Spot]vCPU=2) PROVISIONING us-central1
http-server 3 1 - 1 mins ago 1x GCP(vCPU=2) PROVISIONING us-east1
http-server 4 1 - 1 min ago 1x GCP(vCPU=2) PROVISIONING us-central1
When the required number of spot replicas are not available, SkyServe will provision the number of on-demand replicas needed to meet the target number of replicas. For example, when the target number is 2 and only 1 spot replica is ready, SkyServe will provision 1 on-demand replica to meet the target number of replicas.
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 1m 17s READY 2/4 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 1 1 http://34.23.22.160:8081 3 min ago 1x GCP([Spot]vCPU=2) READY us-east1
http-server 2 1 http://34.68.226.193:8081 3 min ago 1x GCP([Spot]vCPU=2) READY us-central1
http-server 3 1 - 3 mins ago 1x GCP(vCPU=2) SHUTTING_DOWN us-east1
http-server 4 1 - 3 min ago 1x GCP(vCPU=2) SHUTTING_DOWN us-central1
When the spot replicas are ready, SkyServe will automatically scale down on-demand replicas to maximize cost savings.
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 3m 59s READY 2/2 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 1 1 http://34.23.22.160:8081 4 mins ago 1x GCP([Spot]vCPU=2) READY us-east1
http-server 2 1 http://34.68.226.193:8081 4 mins ago 1x GCP([Spot]vCPU=2) READY us-central1
In the event of spot instance interruptions (e.g. replica 1), SkyServe will automatically fallback to on-demand replicas (e.g. launch one on-demand replica) to meet the required capacity of replicas. SkyServe will continue trying to provision one spot replica in the event where spot availability is back. Note that SkyServe will try different regions and clouds to maximize the chance of successfully provisioning spot instances.
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 7m 2s READY 1/3 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 2 1 http://34.68.226.193:8081 7 mins ago 1x GCP([Spot]vCPU=2) READY us-central1
http-server 5 1 - 13 secs ago 1x GCP([Spot]vCPU=2) PROVISIONING us-central1
http-server 6 1 - 13 secs ago 1x GCP(vCPU=2) PROVISIONING us-central1
Eventually, when the spot availability is back, SkyServe will automatically scale down on-demand replicas.
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 10m 5s READY 2/3 54.227.229.217:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
http-server 2 1 http://34.68.226.193:8081 10 mins ago 1x GCP([Spot]vCPU=2) READY us-central1
http-server 5 1 http://34.121.49.94:8081 1 min ago 1x GCP([Spot]vCPU=2) READY us-central1