Ray Serve API#

Python API#

Writing Applications#

serve.Deployment

Class (or function) decorated with the @serve.deployment decorator.

serve.Application

One or more deployments bound with arguments that can be deployed together.

Deployment Decorators#

serve.deployment

Decorator that converts a Python class to a Deployment.

serve.ingress

Wrap a deployment class with a FastAPI application for HTTP request parsing.

serve.batch

Converts a function to asynchronously handle batches.

serve.multiplexed

Wrap a callable or method used to load multiplexed models in a replica.

Deployment Handles#

Note

Ray 2.7 introduces a new DeploymentHandle API that will replace the existing RayServeHandle and RayServeSyncHandle APIs. Existing code will continue to work, but you are encouraged to opt-in to the new API to avoid breakages in the future. To opt into the new API, you can either use handle.options(use_new_handle_api=True) on each handle or set it globally via environment variable: export RAY_SERVE_ENABLE_NEW_HANDLE_API=1.

serve.handle.DeploymentHandle

A handle used to make requests to a deployment at runtime.

serve.handle.DeploymentResponse

A future-like object wrapping the result of a unary deployment handle call.

serve.handle.DeploymentResponseGenerator

A future-like object wrapping the result of a streaming deployment handle call.

serve.handle.RayServeHandle

A handle used to make requests from one deployment to another.

serve.handle.RayServeSyncHandle

A handle used to make requests to the ingress deployment of an application.

Running Applications#

serve.start

Start Serve on the cluster.

serve.run

Run an application and return a handle to its ingress deployment.

serve.delete

Delete an application by its name.

serve.status

Get the status of Serve on the cluster.

serve.shutdown

Completely shut down Serve on the cluster.

Configurations#

serve.config.ProxyLocation

Config for where to run proxies to receive ingress traffic to the cluster.

serve.config.gRPCOptions

gRPC options for the proxies.

serve.config.HTTPOptions

HTTP options for the proxies.

serve.config.AutoscalingConfig

PublicAPI: This API is stable across Ray releases.

Advanced APIs#

serve.get_replica_context

Returns the deployment and replica tag from within a replica at runtime.

serve.context.ReplicaContext

Stores runtime context info for replicas.

serve.get_multiplexed_model_id

Get the multiplexed model ID for the current request.

serve.get_app_handle

Get a handle to the application's ingress deployment by name.

serve.get_deployment_handle

Get a handle to a deployment by name.

Command Line Interface (CLI)#

Serve REST API#

V1 REST API (Single-application)#

PUT "/api/serve/deployments/"#

Declaratively deploys the Serve application. Starts Serve on the Ray cluster if it’s not already running. See single-app config schema for the request’s JSON schema.

Example Request:

PUT /api/serve/deployments/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Content-Type: application/json

{
    "import_path": "text_ml:app",
    "runtime_env": {
        "working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
    },
    "deployments": [
        {"name": "Translator", "user_config": {"language": "french"}},
        {"name": "Summarizer"},
    ]
}

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

GET "/api/serve/deployments/"#

Gets the config for the application currently deployed on the Ray cluster. This config represents the current goal state for the Serve application. See single-app config schema for the response’s JSON schema.

Example Request:

GET /api/serve/deployments/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json

Example Response:

HTTP/1.1 200 OK
Content-Type: application/json

{
    "import_path": "text_ml:app",
    "runtime_env": {
        "working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
    },
    "deployments": [
        {"name": "Translator", "user_config": {"language": "french"}},
        {"name": "Summarizer"},
    ]
}

GET "/api/serve/deployments/status"#

Gets the Serve application’s current status, including all the deployment statuses. See status schema for the response’s JSON schema.

Example Request:

GET /api/serve/deployments/status HTTP/1.1
Host: http://localhost:52365/
Accept: application/json

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

{
    "name": "default",
    "app_status": {
        "status": "RUNNING",
        "message": "",
        "deployment_timestamp": 1694043082.0397763
    },
    "deployment_statuses": [
        {
            "name": "Translator",
            "status": "HEALTHY",
            "message": ""
        },
        {
            "name": "Summarizer",
            "status": "HEALTHY",
            "message": ""
        }
    ]
}

DELETE "/api/serve/deployments/"#

Shuts down Serve and the Serve application running on the Ray cluster. Has no effect if Serve is not running on the Ray cluster.

Example Request:

DELETE /api/serve/deployments/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

V2 REST API (Multi-application)#

PUT "/api/serve/applications/"#

Declaratively deploys a list of Serve applications. If Serve is already running on the Ray cluster, removes all applications not listed in the new config. If Serve is not running on the Ray cluster, starts Serve. See multi-app config schema for the request’s JSON schema.

Example Request:

PUT /api/serve/applications/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Content-Type: application/json

{
    "applications": [
        {
            "name": "text_app",
            "route_prefix": "/",
            "import_path": "text_ml:app",
            "runtime_env": {
                "working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
            },
            "deployments": [
                {"name": "Translator", "user_config": {"language": "french"}},
                {"name": "Summarizer"},
            ]
        },
    ]
}

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

GET "/api/serve/applications/"#

Gets cluster-level info and comprehensive details on all Serve applications deployed on the Ray cluster. See metadata schema for the response’s JSON schema.

GET /api/serve/applications/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json

Example Response (abridged JSON):

HTTP/1.1 200 OK
Content-Type: application/json

{
    "controller_info": {
        "node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
        "node_ip": "10.0.29.214",
        "actor_id": "1d214b7bdf07446ea0ed9d7001000000",
        "actor_name": "SERVE_CONTROLLER_ACTOR",
        "worker_id": "adf416ae436a806ca302d4712e0df163245aba7ab835b0e0f4d85819",
        "log_file_path": "/serve/controller_29778.log"
    },
    "proxy_location": "EveryNode",
    "http_options": {
        "host": "0.0.0.0",
        "port": 8000,
        "root_path": "",
        "request_timeout_s": null,
        "keep_alive_timeout_s": 5
    },
    "grpc_options": {
        "port": 9000,
        "grpc_servicer_functions": []
    },
    "proxies": {
        "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec": {
            "node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
            "node_ip": "10.0.29.214",
            "actor_id": "b7a16b8342e1ced620ae638901000000",
            "actor_name": "SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
            "worker_id": "206b7fe05b65fac7fdceec3c9af1da5bee82b0e1dbb97f8bf732d530",
            "log_file_path": "/serve/http_proxy_10.0.29.214.log",
            "status": "HEALTHY"
        }
    },
    "deploy_mode": "MULTI_APP",
    "applications": {
        "app1": {
            "name": "app1",
            "route_prefix": "/",
            "docs_path": null,
            "status": "RUNNING",
            "message": "",
            "last_deployed_time_s": 1694042836.1912267,
            "deployed_app_config": {
                "name": "app1",
                "route_prefix": "/",
                "import_path": "src.text-test:app",
                "deployments": [
                    {
                        "name": "Translator",
                        "num_replicas": 1,
                        "user_config": {
                            "language": "german"
                        }
                    }
                ]
            },
            "deployments": {
                "Translator": {
                    "name": "Translator",
                    "status": "HEALTHY",
                    "message": "",
                    "deployment_config": {
                        "name": "Translator",
                        "num_replicas": 1,
                        "max_concurrent_queries": 100,
                        "user_config": {
                            "language": "german"
                        },
                        "graceful_shutdown_wait_loop_s": 2.0,
                        "graceful_shutdown_timeout_s": 20.0,
                        "health_check_period_s": 10.0,
                        "health_check_timeout_s": 30.0,
                        "ray_actor_options": {
                            "runtime_env": {
                                "env_vars": {}
                            },
                            "num_cpus": 1.0
                        },
                        "is_driver_deployment": false
                    },
                    "replicas": [
                        {
                            "node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
                            "node_ip": "10.0.29.214",
                            "actor_id": "4bb8479ad0c9e9087fee651901000000",
                            "actor_name": "SERVE_REPLICA::app1#Translator#oMhRlb",
                            "worker_id": "1624afa1822b62108ead72443ce72ef3c0f280f3075b89dd5c5d5e5f",
                            "log_file_path": "/serve/deployment_Translator_app1#Translator#oMhRlb.log",
                            "replica_id": "app1#Translator#oMhRlb",
                            "state": "RUNNING",
                            "pid": 29892,
                            "start_time_s": 1694042840.577496
                        }
                    ]
                },
                "Summarizer": {
                    "name": "Summarizer",
                    "status": "HEALTHY",
                    "message": "",
                    "deployment_config": {
                        "name": "Summarizer",
                        "num_replicas": 1,
                        "max_concurrent_queries": 100,
                        "user_config": null,
                        "graceful_shutdown_wait_loop_s": 2.0,
                        "graceful_shutdown_timeout_s": 20.0,
                        "health_check_period_s": 10.0,
                        "health_check_timeout_s": 30.0,
                        "ray_actor_options": {
                            "runtime_env": {},
                            "num_cpus": 1.0
                        },
                        "is_driver_deployment": false
                    },
                    "replicas": [
                        {
                            "node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
                            "node_ip": "10.0.29.214",
                            "actor_id": "7118ae807cffc1c99ad5ad2701000000",
                            "actor_name": "SERVE_REPLICA::app1#Summarizer#cwiPXg",
                            "worker_id": "12de2ac83c18ce4a61a443a1f3308294caf5a586f9aa320b29deed92",
                            "log_file_path": "/serve/deployment_Summarizer_app1#Summarizer#cwiPXg.log",
                            "replica_id": "app1#Summarizer#cwiPXg",
                            "state": "RUNNING",
                            "pid": 29893,
                            "start_time_s": 1694042840.5789504
                        }
                    ]
                }
            }
        }
    }
}

DELETE "/api/serve/applications/"#

Shuts down Serve and all applications running on the Ray cluster. Has no effect if Serve is not running on the Ray cluster.

Example Request:

DELETE /api/serve/applications/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

Config Schemas#

schema.ServeDeploySchema

Multi-application config for deploying a list of Serve applications to the Ray cluster.

schema.gRPCOptionsSchema

Options to start the gRPC Proxy with.

schema.HTTPOptionsSchema

Options to start the HTTP Proxy with.

schema.ServeApplicationSchema

Describes one Serve application, and currently can also be used as a standalone config to deploy a single application to a Ray cluster.

schema.DeploymentSchema

Specifies options for one deployment within a Serve application.

schema.RayActorOptionsSchema

Options with which to start a replica actor.

Response Schemas#

V1 REST API#

schema.ServeStatusSchema

Describes the status of an application and all its deployments.

V2 REST API#

schema.ServeInstanceDetails

Serve metadata with system-level info and details on all applications deployed to the Ray cluster.

schema.ApplicationDetails

Detailed info about a Serve application.

schema.DeploymentDetails

Detailed info about a deployment within a Serve application.

schema.ReplicaDetails

Detailed info about a single deployment replica.