Authorization#
SkyServe provides robust authorization capabilities at the replica level, allowing you to control access to service endpoints with API keys.
Setup API keys#
SkyServe relies on the authorization of the service running on underlying service replicas, e.g., the inference engine. We take the vLLM inference engine as an example, which supports static API key authorization with an argument --api-key
.
We define a SkyServe service spec for serving Llama-3 chatbot with vLLM and an API key. In the example YAML below, we define the authorization token as an environment variable, AUTH_TOKEN
, and pass it to both the service field to enable readiness_probe
to access the replicas and the vllm entrypoint to start services on replicas with the API key.
# auth.yaml
envs:
MODEL_NAME: Qwen/Qwen3-0.6B
secrets:
HF_TOKEN: null
AUTH_TOKEN: null
service:
readiness_probe:
path: /v1/models
headers:
Authorization: Bearer $AUTH_TOKEN
initial_delay_seconds: 1800
replicas: 1
resources:
accelerators: {L4, A10g, A10, L40, A40, A100, A100-80GB}
cpus: 7+
memory: 20+
ports: 8087
setup: |
uv venv --python 3.10 --seed
source .venv/bin/activate
uv pip install vllm==0.10.0 --torch-backend=auto
# Have to use triton==3.2.0 to avoid https://github.com/triton-lang/triton/issues/6698
uv pip install triton==3.2.0
uv pip install openai
run: |
source .venv/bin/activate
export PATH=$PATH:/sbin
vllm serve $MODEL_NAME --trust-remote-code \
--host 0.0.0.0 --port 8087 \
--api-key $AUTH_TOKEN
To deploy the service, run the following command:
HF_TOKEN=xxx AUTH_TOKEN=yyy sky serve up auth.yaml -n auth --secret HF_TOKEN --secret AUTH_TOKEN
To send a request to the service endpoint, a service client need to include the static API key in a request’s header:
$ ENDPOINT=$(sky serve status --endpoint auth)
$ AUTH_TOKEN=yyy
$ curl $ENDPOINT/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
}' | jq
Example output
{
"id": "chatcmpl-f5f1bffa4b504a8b8e842436f3701b3f",
"object": "chat.completion",
"created": 1753994285,
"model": "Qwen/Qwen3-0.6B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<think>\nOkay, the user is asking, \"Who are you?\" I need to respond appropriately. First, I should acknowledge their question and explain that I'm an AI assistant. I should mention that I'm designed to help with various tasks and provide information. I should keep it friendly and open-ended to encourage further interaction. Let me make sure the response is clear and concise.\n</think>\n\nI'm an AI assistant designed to help with a wide range of questions and tasks. How can I assist you today? 😊",
"refusal": null,
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": [],
"reasoning_content": null
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"service_tier": null,
"system_fingerprint": null,
"usage": {
"prompt_tokens": 23,
"total_tokens": 128,
"completion_tokens": 105,
"prompt_tokens_details": null
},
"prompt_logprobs": null,
"kv_transfer_params": null
}
A service client without an API key will not be able to access the service and get a 401 Unauthorized
error:
$ curl $ENDPOINT/v1/models
{"error": "Unauthorized"}
$ curl $ENDPOINT/v1/models -H "Authorization: Bearer random-string"
{"error": "Unauthorized"}