(kuberay-prometheus-grafana)= # 使用 Prometheus 和 Grafana 本节将介绍如何使用 Prometheus 和 Grafana 监控 Kubernetes 中的 Ray 集群。 如果您没有 Kubernetes 上的 Prometheus 和 Grafana 的任何经验,请观看此 [YouTube 播放列表](https://youtube.com/playlist?list=PLy7NrYWoggjxCF3av5JKwyG7FFF9eLeL4)。 ## 准备 克隆 [KubeRay 仓库](https://github.com/ray-project/kuberay)签出 `master` 分支。 本教程需要存储库中的多个文件。 ## 步骤 1: 使用 Kind 创建 Kubernetes 集群 ```sh kind create cluster ``` ## 步骤 2: 通过 Helm Chart 安装 Kubernetes Prometheus Stack ```sh # Path: kuberay/ ./install/prometheus/install.sh # Check the installation kubectl get all -n prometheus-system # (part of the output) # NAME READY UP-TO-DATE AVAILABLE AGE # deployment.apps/prometheus-grafana 1/1 1 1 46s # deployment.apps/prometheus-kube-prometheus-operator 1/1 1 1 46s # deployment.apps/prometheus-kube-state-metrics 1/1 1 1 46s ``` * KubeRay 提供了 [install.sh 脚本](https://github.com/ray-project/kuberay/blob/master/install/prometheus/install.sh) 来自动在命名空间 `prometheus-system` 中安装 [kube-prometheus-stack v48.2.1](https://github.com/prometheus-community/helm-charts/tree/kube-prometheus-stack-48.2.1/charts/kube-prometheus-stack) chart 和相关的自定义资源,包括 **ServiceMonitor**、 **PodMonitor** 和 **PrometheusRule**。 * 我们对 kube-prometheus-stack 图表中的原始图表 `values.yaml` 进行了一些修改,以允许在 Ray Dashboard 中嵌入 Grafana 面板。有关更多详细信息,请参阅 [overrides.yaml](https://github.com/ray-project/kuberay/tree/master/install/prometheus/overrides.yaml)。 ```yaml grafana: grafana.ini: security: allow_embedding: true auth.anonymous: enabled: true org_role: Viewer ``` ## 步骤 3: 安装 KubeRay operator * 按照 [文档](kuberay-operator-deploy) 通过 Helm 存储库安装最新的稳定 KubeRay Operator。 ## 步骤 4: 安装 RayCluster ```sh # path: ray-operator/config/samples/ kubectl apply -f ray-cluster.embed-grafana.yaml # Check ${RAYCLUSTER_HEAD_POD} kubectl get pod -l ray.io/node-type=head # Example output: # NAME READY STATUS RESTARTS AGE # raycluster-kuberay-head-btwc2 1/1 Running 0 63s # Wait until all Ray Pods are running and forward the port of the Prometheus metrics endpoint in a new terminal. kubectl port-forward --address 0.0.0.0 ${RAYCLUSTER_HEAD_POD} 8080:8080 curl localhost:8080 # Example output (Prometheus metrics format): # # HELP ray_spill_manager_request_total Number of {spill, restore} requests. # # TYPE ray_spill_manager_request_total gauge # ray_spill_manager_request_total{Component="raylet",NodeAddress="10.244.0.13",Type="Restored",Version="2.0.0"} 0.0 # Ensure that the port (8080) for the metrics endpoint is also defined in the head's Kubernetes service. kubectl get service # NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE # raycluster-kuberay-head-svc ClusterIP 10.96.201.142 6379/TCP,8265/TCP,8080/TCP,8000/TCP,10001/TCP 106m ``` * 默认情况下,KubeRay 通过内置导出器在端口 **8080** 中公开 Prometheus 指标端点。因此,我们不需要安装任何外部导出器。 * 如果要将指标端点配置到不同的端口,请参阅 [kuberay/#954](https://github.com/ray-project/kuberay/pull/954) 。 * Prometheus 指标格式: * `# HELP`: 描述这个指标的含义。 * `# TYPE`: 参考 [文档](https://prometheus.io/docs/concepts/metric_types/) 了解更多详细信息。 * [ray-cluster.embed-grafana.yaml](https://github.com/ray-project/kuberay/blob/v1.0.0-rc.0/ray-operator/config/samples/ray-cluster.embed-grafana.yaml)中定义了三个必需的环境变量。有关这些环境变量的更多详细信息,请参阅 [配置和管理 Ray Dashboard](https://docs.ray.io/en/latest/cluster/configure-manage-dashboard.html) 。 ```yaml env: - name: RAY_GRAFANA_IFRAME_HOST value: http://127.0.0.1:3000 - name: RAY_GRAFANA_HOST value: http://prometheus-grafana.prometheus-system.svc:80 - name: RAY_PROMETHEUS_HOST value: http://prometheus-kube-prometheus-prometheus.prometheus-system.svc:9090 ``` * 请注意,我们没有在 head Pod 中部署 Grafana,因此我们需要同时设置 `RAY_GRAFANA_IFRAME_HOST` 和 `RAY_GRAFANA_HOST`。 `RAY_GRAFANA_HOST` 由 head Pod 用于向后端的 Grafana 发送健康检查请求。 `RAY_GRAFANA_IFRAME_HOST` 您的浏览器使用它从 Grafana 服务器而不是从 head Pod 获取 Grafana 面板。 在本例中我们将 Grafana 的端口转发到 `127.0.0.1:3000` 所以我们设置 `RAY_GRAFANA_IFRAME_HOST` 为 `http://127.0.0.1:3000`。 * `http://` 是必须的。 ## 步骤 5: 使用 ServiceMonitor 收集头节点指标 ```yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: ray-head-monitor namespace: prometheus-system labels: # `release: $HELM_RELEASE`: Prometheus can only detect ServiceMonitor with this label. release: prometheus spec: jobLabel: ray-head # Only select Kubernetes Services in the "default" namespace. namespaceSelector: matchNames: - default # Only select Kubernetes Services with "matchLabels". selector: matchLabels: ray.io/node-type: head # A list of endpoints allowed as part of this ServiceMonitor. endpoints: - port: metrics targetLabels: - ray.io/cluster ``` * 以上 YAML 示例是 [serviceMonitor.yaml](https://github.com/ray-project/kuberay/blob/master/config/prometheus/serviceMonitor.yaml),它是由 **install.sh** 创建。因此,不需要在这里创建任何东西。 * 更多配置信息请参见 [ServiceMonitor 官方文档](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#servicemonitor)。 * `release: $HELM_RELEASE`: Prometheus 只检测带有 ServiceMonitor 标签的。
```sh helm ls -n prometheus-system # ($HELM_RELEASE is "prometheus".) # NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION # prometheus prometheus-system 1 2023-02-06 06:27:05.530950815 +0000 UTC deployed kube-prometheus-stack-44.3.1 v0.62.0 kubectl get prometheuses.monitoring.coreos.com -n prometheus-system -oyaml # serviceMonitorSelector: # matchLabels: # release: prometheus # podMonitorSelector: # matchLabels: # release: prometheus # ruleSelector: # matchLabels: # release: prometheus ``` * `namespaceSelector` 和 `seletor` 用于选择导出器的 Kubernetes 服务。 。由于 Ray 使用内置导出器,因此 **ServiceMonitor** 选择 Ray 的头服务来公开指标端点(即此处的端口 8080)。 ```sh kubectl get service -n default -l ray.io/node-type=head # NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE # raycluster-kuberay-head-svc ClusterIP 10.96.201.142 6379/TCP,8265/TCP,8080/TCP,8000/TCP,10001/TCP 153m ``` * `targetLabels`: 我们添加了 `spec.targetLabels[0].ray.io/cluster` 因为我们希望在此 ServiceMonitor 生成的指标中包含 RayCluster 的名称。`ray.io/cluster` 标签是Ray头节点服务的一部分,它将被转换为 `ray_io_cluster` 指标标签。 也就是说,将导入的任何指标也将包含以下标签 `ray_io_cluster=`。这可能看起来是可选的,但如果您部署多个 RayCluster,则它变得强制。 ## 步骤 6: 使用 PodMonitors 收集 worker 节点指标 KubeRay Operator 不会为 Ray Worker Pod 创建 Kubernetes 服务,因此我们无法使用 Prometheus ServiceMonitor 从 Worker Pod 中获取指标。要收集 worker 指标,我们可以使用 `Prometheus PodMonitors CRD` 。 **Note**: 我们可以创建一个 Kubernetes Service,其中选择器是我们的 worker pod 中的公共标签子集,但是,这并不理想,因为我们的 worker 彼此独立,也就是说,它们不是由副本集控制器生成的副本的集合。因此,我们应该避免使用 Kubernetes service 将它们分组在一起。 ```yaml apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: ray-workers-monitor namespace: prometheus-system labels: # `release: $HELM_RELEASE`: Prometheus can only detect PodMonitor with this label. release: prometheus ray.io/cluster: raycluster-kuberay # $RAY_CLUSTER_NAME: "kubectl get rayclusters.ray.io" spec: jobLabel: ray-workers # Only select Kubernetes Pods in the "default" namespace. namespaceSelector: matchNames: - default # Only select Kubernetes Pods with "matchLabels". selector: matchLabels: ray.io/node-type: worker # A list of endpoints allowed as part of this PodMonitor. podMetricsEndpoints: - port: metrics ``` * `release: $HELM_RELEASE`: Prometheus 只能检测带有此标签的 PodMonitor。请参阅 [此处](#prometheus-can-only-detect-this-label) 了解更多详细信息。 * 在 `namespaceSelector` 和 `selector` 中的 **PodMonitor** 用于选择 Kubernetes Pod。 ```sh kubectl get pod -n default -l ray.io/node-type=worker # NAME READY STATUS RESTARTS AGE # raycluster-kuberay-worker-workergroup-5stpm 1/1 Running 0 3h16m ``` * `ray.io/cluster: $RAY_CLUSTER_NAME`: 我们还通过手动添加 `ray.io/cluster: ` 定义了 `metadata.labels`,然后指示 PodMonitors 资源通过 `spec.podTargetLabels[0].ray.io/cluster` 标签添加到抓取的指标中。 ## 步骤 7: 使用记录规则收集自定义指标 [记录规则](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) 允许我们预先计算经常需要的或计算成本较高的 [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) 表达式,并将其结果保存为自定义指标。请注意,这与 [自定义应用级别指标](application-level-metrics) 不同,他是用于 ray 应用的可见性的。 ```yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: ray-cluster-gcs-rules namespace: prometheus-system labels: # `release: $HELM_RELEASE`: Prometheus can only detect Recording Rules with this label. release: prometheus spec: groups: - # Rules within a group are run periodically with the same evaluation interval(30s in this example). name: ray-cluster-main-staging-gcs.rules # How often rules in the group are evaluated. interval: 30s rules: - # The name of the custom metric. # Also see best practices for naming metrics created by recording rules: # https://prometheus.io/docs/practices/rules/#recording-rules record: ray_gcs_availability_30d # PromQL expression. expr: | ( 100 * ( sum(rate(ray_gcs_update_resource_usage_time_bucket{container="ray-head", le="20.0"}[30d])) / sum(rate(ray_gcs_update_resource_usage_time_count{container="ray-head"}[30d])) ) ) ``` * 上面的 PromQL 表达式为: $$\frac{ number\ of\ update\ resource\ usage\ RPCs\ that\ have\ RTT\ smaller\ then\ 20ms\ in\ last\ 30\ days\ }{total\ number\ of\ update\ resource\ usage\ RPCs\ in\ last\ 30\ days\ } \times 100 $$ * 上面的记录规则是 [prometheusRules.yaml](https://github.com/ray-project/kuberay/blob/master/config/prometheus/rules/prometheusRules.yaml)中定义的规则之一,它是由 **install.sh**创建的。因此,不需要在这里创建任何东西。 * 有关配置的更多详细信息,请参阅 [PrometheusRule 官方文档](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#prometheusrule)。 * `release: $HELM_RELEASE`: Prometheus 只能检测带有此标签的 PrometheusRule。请参阅 [这里](#prometheus-can-only-detect-this-label) 了解更多信息。 * 可以在运行时重新加载。如果需要,使用 `kubectl apply {modified prometheusRules.yaml}` 重新配置规则。 ## 步骤 8: 使用警报规则定义警报条件 [警报规则](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) 允许我们基于 [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) 表达式定义警报条件,并向 [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager) 发送有关触发警报的通知,Alertmanager 在简单的警报定义之上添加了摘要、通知速率限制、静音和警报依赖项。 ```yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: ray-cluster-gcs-rules namespace: prometheus-system labels: # `release: $HELM_RELEASE`: Prometheus can only detect Alerting Rules with this label. release: prometheus spec: groups: - name: ray-cluster-main-staging-gcs.rules # How often rules in the group are evaluated. interval: 30s rules: - alert: MissingMetricRayGlobalControlStore # A set of informational labels. Annotations can be used to store longer additional information compared to rules.0.labels. annotations: description: Ray GCS is not emitting any metrics for Resource Update requests summary: Ray GCS is not emitting metrics anymore # PromQL expression. expr: | ( absent(ray_gcs_update_resource_usage_time_bucket) == 1 ) # Time that Prometheus will wait and check if the alert continues to be active during each evaluation before firing the alert. # firing alerts may be due to false positives or noise if the setting value is too small. # On the other hand, if the value is too big, the alerts may not be handled in time. for: 5m # A set of additional labels to be attached to the alert. # It is possible to overwrite the labels in metadata.labels, so make sure one of the labels match the label in ruleSelector.matchLabels. labels: severity: critical ``` * 上面的 PromQL 表达式检查指标是否不存在时间序列 `ray_gcs_update_resource_usage_time_bucket` 指标。参考 [absent()](https://prometheus.io/docs/prometheus/latest/querying/functions/#absent) 获取更多详细信息。 * 上面的警报规则是 [prometheusRules.yaml](https://github.com/ray-project/kuberay/blob/master/config/prometheus/rules/prometheusRules.yaml)中定义的规则之一,它是由 **install.sh** 创建的。因此,不需要在这里创建任何东西。 * 报警规则的配置方式与记录规则相同。 ## 步骤 9: 访问 Prometheus Web UI ```sh # Forward the port of Prometheus Web UI in the Prometheus server Pod. kubectl port-forward --address 0.0.0.0 prometheus-prometheus-kube-prometheus-prometheus-0 -n prometheus-system 9090:9090 ``` - 转到 `${YOUR_IP}:9090/targets` (例如 `127.0.0.1:9090/targets`)。您应该能够看到: - `podMonitor/prometheus-system/ray-workers-monitor/0 (1/1 up)` - `serviceMonitor/prometheus-system/ray-head-monitor/0 (1/1 up)` ![Prometheus Web UI](../images/prometheus_web_ui.png) - 转到 `${YOUR_IP}:9090/graph`。您应该能够查询: - [系统指标](https://docs.ray.io/en/latest/ray-observability/ray-metrics.html#system-metrics) - [应用级别指标](https://docs.ray.io/en/latest/ray-observability/ray-metrics.html#application-level-metrics) - 记录规则中定义的自定义指标 (如, `ray_gcs_availability_30d`) - 转到 `${YOUR_IP}:9090/alerts`。您应该能够看到: - 警报规则 (如, `MissingMetricRayGlobalControlStore`). ## 步骤 10: 访问 Grafana ```sh # Forward the port of Grafana kubectl port-forward --address 0.0.0.0 deployment/prometheus-grafana -n prometheus-system 3000:3000 # Note: You need to update `RAY_GRAFANA_IFRAME_HOST` if you expose Grafana to a different port. # Check ${YOUR_IP}:3000/login for the Grafana login page (e.g. 127.0.0.1:3000/login). # The default username is "admin" and the password is "prom-operator". ``` > Note: `kubectl port-forward` 不建议用于生产用途。 请参阅 [this Grafana document](https://grafana.com/tutorials/run-grafana-behind-a-proxy/) ,了解在反向代理后面公开 Grafana。 * 默认密码由kube-prometheus-stack 图表的 [values.yaml](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml) 的 `grafana.adminPassword` 指定。 * 成功登录Grafana后,我们可以通过 **dashboard_default.json** 将Ray Dashboard导入Grafana 。 * 点击左侧面板中的“仪表板”图标。 * 点击 "New"。 * 点击 "Import"。 * 点击 "Upload JSON file". * 创建 JSON 文件。 * 情况 1: 如果您使用的是 Ray 2.5.0,则可以使用 [config/grafana/default_grafana_dashboard.json](https://github.com/ray-project/kuberay/blob/master/config/grafana/default_grafana_dashboard.json)。 * 情况 2: 否则,您应该从 Head Pod 的 `/tmp/ray/session_latest/metrics/grafana/dashboards/` 路径导入 `default_grafana_dashboard.json` 文件。你可以使用 `kubectl cp` 拷贝 Head Pod 的文件到本地机器。 * 点击 "Import". * TODO: 请注意,手动导入仪表板并不理想。我们应该找到一种自动导入仪表板的方法。 ![Grafana Ray Dashboard](../images/grafana_ray_dashboard.png) ## 步骤 11: 将 Grafana 面板嵌入 Ray Dashboard ```sh kubectl port-forward --address 0.0.0.0 svc/raycluster-embed-grafana-head-svc 8265:8265 # Visit http://127.0.0.1:8265/#/metrics in your browser. ``` ![带有 Grafana 面板的 Ray 仪表盘](../images/ray_dashboard_embed_grafana.png)