(kuberay-dev-serve)= # 在 RayCluster 上开发 Ray Serve Python 脚本 在本教程中,您将学习如何针对 RayCluster 有效调试 Ray Serve 脚本,与直接使用 RayService 开发脚本相比,实现增强的可观察性和更快的迭代速度。许多 RayService 问题都与 Ray Serve Python 脚本相关,因此在将脚本部署到 RayService 之前确保脚本的正确性非常重要。本教程将向您展示如何为 RayCluster 上的 MobileNet 图像分类器开发 Ray Serve Python 脚本。您可以在本地 Kind 集群上部署并提供分类器,而无需 GPU。有关更多详细信息,请参阅 [ray-service.mobilenet.yaml](https://github.com/ray-project/kuberay/blob/v1.0.0-rc.0/ray-operator/config/samples/ray-service.mobilenet.yaml) 和 [mobilenet-rayservice.md](kuberay-mobilenet-rayservice-example) 。 # 步骤 1: 安装 KubeRay 集群 按照 [本文档](kuberay-operator-deploy) 通过 Helm 存储库安装最新的稳定 KubeRay Operator。 # 步骤 2: 创建 RayCluster CR ```sh helm install raycluster kuberay/ray-cluster --version 1.0.0-rc.0 ``` # 步骤 3: 登录head Pod ```sh export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers) kubectl exec -it $HEAD_POD -- bash ``` # 步骤 4: 准备 Ray Serve Python 脚本并运行 Ray Serve 应用程序 ```sh # Execute the following command in the head Pod git clone https://github.com/ray-project/serve_config_examples.git cd serve_config_examples # Try to launch the Ray Serve application serve run mobilenet.mobilenet:app # [Error message] # from tensorflow.keras.preprocessing import image # ModuleNotFoundError: No module named 'tensorflow' ``` * `serve run mobilenet.mobilenet:app`: 第一个 `mobilenet` 是在目录 `serve_config_examples/` 中的名字, 第二个 `mobilenet` 是目录 `mobilenet/` 中的 Python 文件的名称, `app` 是 Python 文件中代表 Ray Serve 应用程序的变量的名称。 有关更多详细信息,请参阅 "import_path" [rayservice-troubleshooting.md](kuberay-raysvc-troubleshoot) 部分。 # 步骤 5: 修改 Ray 镜像 `rayproject/ray:${RAY_VERSION}` 为 `rayproject/ray-ml:${RAY_VERSION}` ```sh # Uninstall RayCluster helm uninstall raycluster # Install the RayCluster CR with the Ray image `rayproject/ray-ml:${RAY_VERSION}` helm install raycluster kuberay/ray-cluster --version 1.0.0-rc.0 --set image.repository=rayproject/ray-ml ``` 步骤 4 的错误信息表明 Ray 镜像 `rayproject/ray:${RAY_VERSION}` 没有TensorFlow包。 由于 TensorFlow 的规模很大,我们选择使用以 TensorFlow 为基础的映像,而不是将通过 {ref}`Runtime Environments ` 安装。 此步骤,我们将修改 Ray 镜像 `rayproject/ray:${RAY_VERSION}` 为 `rayproject/ray-ml:${RAY_VERSION}`。 # 步骤 6: 重复步骤 3 和 4 ```sh # Repeat 步骤 3 and 步骤 4 to log in to the new head Pod and run the Ray Serve application. # You should successfully launch the Ray Serve application this time. serve run mobilenet.mobilenet:app # [Example output] # (ServeReplica:default_ImageClassifier pid=139, ip=10.244.0.8) Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5 # 8192/14536120 [..............................] - ETA: 0s) # 4202496/14536120 [=======>......................] - ETA: 0s) # 12902400/14536120 [=========================>....] - ETA: 0s) # 14536120/14536120 [==============================] - 0s 0us/step # 2023-07-17 14:04:43,737 SUCC scripts.py:424 -- Deployed Serve app successfully. ``` # 步骤 7: 向 Ray Serve 应用程序提交请求 ```sh # (On your local machine) Forward the serve port of the head Pod kubectl port-forward --address 0.0.0.0 $HEAD_POD 8000 # Clone the repository on your local machine git clone https://github.com/ray-project/serve_config_examples.git cd serve_config_examples/mobilenet # Prepare a sample image file. `stable_diffusion_example.png` is a cat image generated by the Stable Diffusion model. curl -O https://raw.githubusercontent.com/ray-project/kuberay/master/docs/images/stable_diffusion_example.png # Update `image_path` in `mobilenet_req.py` to the path of `stable_diffusion_example.png` # Send a request to the Ray Serve application. python3 mobilenet_req.py # [Error message] # Unexpected error, traceback: ray::ServeReplica:default_ImageClassifier.handle_request() (pid=139, ip=10.244.0.8) # File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/_private/utils.py", line 254, in wrap_to_ray_error # raise exception # File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/_private/replica.py", line 550, in invoke_single # result = await method_to_call(*args, **kwargs) # File "./mobilenet/mobilenet.py", line 24, in __call__ # File "/home/ray/anaconda3/lib/python3.7/site-packages/starlette/requests.py", line 256, in _get_form # ), "The `python-multipart` library must be installed to use form parsing." # AssertionError: The `python-multipart` library must be installed to use form parsing.. ``` 需要 `python-multipart` 来解析 `starlette.requests.form()` 函数,所以当我们向Ray Serve应用发送请求时,会报错信息。 # 步骤 8: 使用运行时环境重新启动 Ray Serve 应用程序。 ```sh # In the head Pod, stop the Ray Serve application serve shutdown # Check the Ray Serve application status serve status # [Example output] # There are no applications running on this cluster. # Launch the Ray Serve application with runtime environment. serve run mobilenet.mobilenet:app --runtime-env-json='{"pip": ["python-multipart==0.0.6"]}' # (On your local machine) Submit a request to the Ray Serve application again, and you should get the correct prediction. python3 mobilenet_req.py # [Example output] # {"prediction": ["n02123159", "tiger_cat", 0.2994779646396637]} ``` # 步骤 9: 创建 RayService YAML 文件 在前面的步骤中,我们发现使用 Ray 镜像 `rayproject/ray-ml:${RAY_VERSION}` 和 {ref}`runtime environments ` `python-multipart==0.0.6`可以成功启动Ray Serve应用程序。 因此,我们可以创建一个具有相同 Ray 镜像和运行环境的 RayService YAML 文件。 更多详情请参考 [ray-service.mobilenet.yaml](https://github.com/ray-project/kuberay/blob/v1.0.0-rc.0/ray-operator/config/samples/ray-service.mobilenet.yaml) 和 [mobilenet-rayservice.md](kuberay-mobilenet-rayservice-example)。