Updating a Service#
SkyServe supports updating a deployed service, which can be used to change:
Replica code (e.g.,
run
/setup
; useful for debugging)Replica resource spec in
resources
(e.g., accelerator or instance type)Service spec in
service
(e.g., number of replicas or autoscaling spec)
During an update, the service will remain accessible with no downtime and its endpoint will remain the same.
To update an existing service, use sky serve update
:
$ sky serve update service-name new_service.yaml
SkyServe will launch new replicas described by new_service.yaml
with the following behavior:
An update is initiated, and traffic will continue to be redirected to existing (old) replicas.
New replicas (with new settings) are brought up in the background.
Once
min_replicas
new replicas are ready, new traffic will start to be redirected to the new replicas, while old replicas will stop receiving traffic and will be scaled down.
For example, suppose we have a running service hosting an AI model with the following resource configuration:
resources:
memory: 32+
accelerators: T4
It is possible to update it to use a new resource configuration for all replicas, such as:
resources:
memory: 128+
accelerators: A100
To support updates, a service and its replicas are versioned (starting from 1).
During an update, traffic is entirely serviced by either old-versioned or
new-versioned replicas. sky serve status
shows the latest service
version and each replica’s version.
Example#
We first launch a simple HTTP service:
$ sky serve up examples/serve/http_server/task.yaml -n http-server
We can use sky serve status http-server
to check the status of the service:
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 1m 41s READY 2/2 44.206.240.249:30002
Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
http-server 1 1 54.173.203.169 2 mins ago 1x AWS(vCPU=2) READY us-east-1
http-server 2 1 52.87.241.103 2 mins ago 1x AWS(vCPU=2) READY us-east-1
Service http-server
has an initial version of 1.
Suppose we want to update the service to use 4 vCPUs instead of 2. We can update
the task yaml examples/serve/http_server/task.yaml
, by changing the cpu
field:
# examples/serve/http_server/task.yaml
service:
readiness_probe:
path: /health
initial_delay_seconds: 20
replicas: 2
resources:
ports: 8081
cpus: 4+
workdir: examples/serve/http_server
run: python3 server.py
We can then use sky serve update
to update the service:
$ sky serve update http-server examples/serve/http_server/task.yaml
SkyServe will trigger launching two new replicas with 4 vCPUs. Before
min_replicas
(set to service.replicas
when unspecified; i.e., 2) new
replicas are ready, SkyServe will only send traffic to the old replicas. When
the number of new replicas reaches min_replicas
, SkyServe will scale down
old replicas to save cost. The service’s version is updated from 1 to 2.
Replicas 3 and 4 are the new replicas with 4 vCPUs. Replicas 1 and 2 are the
old replicas with 2 vCPUs.
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 2 6m 15s READY 2/4 44.206.240.249:30002
Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
http-server 1 1 54.173.203.169 6 mins ago 1x AWS(vCPU=2) READY us-east-1
http-server 2 1 52.87.241.103 6 mins ago 1x AWS(vCPU=2) READY us-east-1
http-server 3 2 - 21 secs ago 1x AWS(vCPU=4) PROVISIONING us-east-1
http-server 4 2 - 21 secs ago 1x AWS(vCPU=4) PROVISIONING us-east-1
The old replicas will be scaled down when the new replicas are ready. At this point, SkyServe will start sending traffic to the new replicas.
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 2 10m 4s READY 2/4 44.206.240.249:30002
Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
http-server 1 1 54.173.203.169 10 mins ago 1x AWS(vCPU=2) SHUTTING_DOWN us-east-1
http-server 2 1 52.87.241.103 10 mins ago 1x AWS(vCPU=2) SHUTTING_DOWN us-east-1
http-server 3 2 3.93.241.163 1 min ago 1x AWS(vCPU=4) READY us-east-1
http-server 4 2 18.206.226.82 1 min ago 1x AWS(vCPU=4) READY us-east-1
Eventually, we will only have new replicas ready to serve user requests.
$ sky serve status http-server
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 2 11m 42s READY 2/2 44.206.240.249:30002
Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
http-server 3 2 3.93.241.163 3 mins ago 1x AWS(vCPU=4) READY us-east-1
http-server 4 2 18.206.226.82 3 mins ago 1x AWS(vCPU=4) READY us-east-1