Ray Serve API
Contents
Ray Serve API#
Python API#
Writing Applications#
Class (or function) decorated with the |
|
One or more deployments bound with arguments that can be deployed together. |
Deployment Decorators#
Decorator that converts a Python class to a |
|
Wrap a deployment class with a FastAPI application for HTTP request parsing. |
|
Converts a function to asynchronously handle batches. |
|
Wrap a callable or method used to load multiplexed models in a replica. |
Deployment Handles#
Note
Ray 2.7 introduces a new DeploymentHandle API that will replace the existing RayServeHandle and RayServeSyncHandle APIs.
Existing code will continue to work, but you are encouraged to opt-in to the new API to avoid breakages in the future.
To opt into the new API, you can either use handle.options(use_new_handle_api=True) on each handle or set it globally via environment variable: export RAY_SERVE_ENABLE_NEW_HANDLE_API=1.
A handle used to make requests to a deployment at runtime. |
|
A future-like object wrapping the result of a unary deployment handle call. |
|
A future-like object wrapping the result of a streaming deployment handle call. |
|
A handle used to make requests from one deployment to another. |
|
A handle used to make requests to the ingress deployment of an application. |
Running Applications#
Start Serve on the cluster. |
|
Run an application and return a handle to its ingress deployment. |
|
Delete an application by its name. |
|
Get the status of Serve on the cluster. |
|
Completely shut down Serve on the cluster. |
Configurations#
Config for where to run proxies to receive ingress traffic to the cluster. |
|
gRPC options for the proxies. |
|
HTTP options for the proxies. |
|
PublicAPI: This API is stable across Ray releases. |
Advanced APIs#
Returns the deployment and replica tag from within a replica at runtime. |
|
Stores runtime context info for replicas. |
|
Get the multiplexed model ID for the current request. |
|
Get a handle to the application's ingress deployment by name. |
|
Get a handle to a deployment by name. |
Command Line Interface (CLI)#
Serve REST API#
V1 REST API (Single-application)#
PUT "/api/serve/deployments/"#
Declaratively deploys the Serve application. Starts Serve on the Ray cluster if it’s not already running. See single-app config schema for the request’s JSON schema.
Example Request:
PUT /api/serve/deployments/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Content-Type: application/json
{
"import_path": "text_ml:app",
"runtime_env": {
"working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
},
"deployments": [
{"name": "Translator", "user_config": {"language": "french"}},
{"name": "Summarizer"},
]
}
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
GET "/api/serve/deployments/"#
Gets the config for the application currently deployed on the Ray cluster. This config represents the current goal state for the Serve application. See single-app config schema for the response’s JSON schema.
Example Request:
GET /api/serve/deployments/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response:
HTTP/1.1 200 OK
Content-Type: application/json
{
"import_path": "text_ml:app",
"runtime_env": {
"working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
},
"deployments": [
{"name": "Translator", "user_config": {"language": "french"}},
{"name": "Summarizer"},
]
}
GET "/api/serve/deployments/status"#
Gets the Serve application’s current status, including all the deployment statuses. See status schema for the response’s JSON schema.
Example Request:
GET /api/serve/deployments/status HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
{
"name": "default",
"app_status": {
"status": "RUNNING",
"message": "",
"deployment_timestamp": 1694043082.0397763
},
"deployment_statuses": [
{
"name": "Translator",
"status": "HEALTHY",
"message": ""
},
{
"name": "Summarizer",
"status": "HEALTHY",
"message": ""
}
]
}
DELETE "/api/serve/deployments/"#
Shuts down Serve and the Serve application running on the Ray cluster. Has no effect if Serve is not running on the Ray cluster.
Example Request:
DELETE /api/serve/deployments/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
V2 REST API (Multi-application)#
PUT "/api/serve/applications/"#
Declaratively deploys a list of Serve applications. If Serve is already running on the Ray cluster, removes all applications not listed in the new config. If Serve is not running on the Ray cluster, starts Serve. See multi-app config schema for the request’s JSON schema.
Example Request:
PUT /api/serve/applications/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Content-Type: application/json
{
"applications": [
{
"name": "text_app",
"route_prefix": "/",
"import_path": "text_ml:app",
"runtime_env": {
"working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
},
"deployments": [
{"name": "Translator", "user_config": {"language": "french"}},
{"name": "Summarizer"},
]
},
]
}
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
GET "/api/serve/applications/"#
Gets cluster-level info and comprehensive details on all Serve applications deployed on the Ray cluster. See metadata schema for the response’s JSON schema.
GET /api/serve/applications/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response (abridged JSON):
HTTP/1.1 200 OK
Content-Type: application/json
{
"controller_info": {
"node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"node_ip": "10.0.29.214",
"actor_id": "1d214b7bdf07446ea0ed9d7001000000",
"actor_name": "SERVE_CONTROLLER_ACTOR",
"worker_id": "adf416ae436a806ca302d4712e0df163245aba7ab835b0e0f4d85819",
"log_file_path": "/serve/controller_29778.log"
},
"proxy_location": "EveryNode",
"http_options": {
"host": "0.0.0.0",
"port": 8000,
"root_path": "",
"request_timeout_s": null,
"keep_alive_timeout_s": 5
},
"grpc_options": {
"port": 9000,
"grpc_servicer_functions": []
},
"proxies": {
"cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec": {
"node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"node_ip": "10.0.29.214",
"actor_id": "b7a16b8342e1ced620ae638901000000",
"actor_name": "SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"worker_id": "206b7fe05b65fac7fdceec3c9af1da5bee82b0e1dbb97f8bf732d530",
"log_file_path": "/serve/http_proxy_10.0.29.214.log",
"status": "HEALTHY"
}
},
"deploy_mode": "MULTI_APP",
"applications": {
"app1": {
"name": "app1",
"route_prefix": "/",
"docs_path": null,
"status": "RUNNING",
"message": "",
"last_deployed_time_s": 1694042836.1912267,
"deployed_app_config": {
"name": "app1",
"route_prefix": "/",
"import_path": "src.text-test:app",
"deployments": [
{
"name": "Translator",
"num_replicas": 1,
"user_config": {
"language": "german"
}
}
]
},
"deployments": {
"Translator": {
"name": "Translator",
"status": "HEALTHY",
"message": "",
"deployment_config": {
"name": "Translator",
"num_replicas": 1,
"max_concurrent_queries": 100,
"user_config": {
"language": "german"
},
"graceful_shutdown_wait_loop_s": 2.0,
"graceful_shutdown_timeout_s": 20.0,
"health_check_period_s": 10.0,
"health_check_timeout_s": 30.0,
"ray_actor_options": {
"runtime_env": {
"env_vars": {}
},
"num_cpus": 1.0
},
"is_driver_deployment": false
},
"replicas": [
{
"node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"node_ip": "10.0.29.214",
"actor_id": "4bb8479ad0c9e9087fee651901000000",
"actor_name": "SERVE_REPLICA::app1#Translator#oMhRlb",
"worker_id": "1624afa1822b62108ead72443ce72ef3c0f280f3075b89dd5c5d5e5f",
"log_file_path": "/serve/deployment_Translator_app1#Translator#oMhRlb.log",
"replica_id": "app1#Translator#oMhRlb",
"state": "RUNNING",
"pid": 29892,
"start_time_s": 1694042840.577496
}
]
},
"Summarizer": {
"name": "Summarizer",
"status": "HEALTHY",
"message": "",
"deployment_config": {
"name": "Summarizer",
"num_replicas": 1,
"max_concurrent_queries": 100,
"user_config": null,
"graceful_shutdown_wait_loop_s": 2.0,
"graceful_shutdown_timeout_s": 20.0,
"health_check_period_s": 10.0,
"health_check_timeout_s": 30.0,
"ray_actor_options": {
"runtime_env": {},
"num_cpus": 1.0
},
"is_driver_deployment": false
},
"replicas": [
{
"node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"node_ip": "10.0.29.214",
"actor_id": "7118ae807cffc1c99ad5ad2701000000",
"actor_name": "SERVE_REPLICA::app1#Summarizer#cwiPXg",
"worker_id": "12de2ac83c18ce4a61a443a1f3308294caf5a586f9aa320b29deed92",
"log_file_path": "/serve/deployment_Summarizer_app1#Summarizer#cwiPXg.log",
"replica_id": "app1#Summarizer#cwiPXg",
"state": "RUNNING",
"pid": 29893,
"start_time_s": 1694042840.5789504
}
]
}
}
}
}
}
DELETE "/api/serve/applications/"#
Shuts down Serve and all applications running on the Ray cluster. Has no effect if Serve is not running on the Ray cluster.
Example Request:
DELETE /api/serve/applications/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
Config Schemas#
Multi-application config for deploying a list of Serve applications to the Ray cluster. |
|
Options to start the gRPC Proxy with. |
|
Options to start the HTTP Proxy with. |
|
Describes one Serve application, and currently can also be used as a standalone config to deploy a single application to a Ray cluster. |
|
Specifies options for one deployment within a Serve application. |
|
Options with which to start a replica actor. |
Response Schemas#
V1 REST API#
Describes the status of an application and all its deployments. |
V2 REST API#
Serve metadata with system-level info and details on all applications deployed to the Ray cluster. |
|
Detailed info about a Serve application. |
|
Detailed info about a deployment within a Serve application. |
|
Detailed info about a single deployment replica. |