AB 测试和渐进式发布

简单的 AB 测试

Seldon Core 提供了使用 Istio 和 Ambassador 按照要求进行流量划分,轻松创建 AB 测试和影子流量部署的能力。

使用 Seldon Analytics 面板可在 AB 测试中使用 prometheus 为不同的预估器分析计算指标。

高级 AB 测试实验以及渐进式发布

更高级的用例我们推荐使用 Iter8 进行集成,它提供了明确的候选实验目标以及清晰的候选模型优胜版。Iter8 还提供渐进式发布功能,以自动选择测试候选模型,如果候选模型的性能优于孵化模型,则将其推广到生产环境。

在 Seldon,我们提供了两个关于如何运行 Iter8 实验的当前示例。

  1. 在单个 Seldon Deployment 上的 Seldon/Iter8 实验。

  2. 在特定 Seldon Deployments 上的 Seldon/Iter8 实验。

在单个 Seldon Deployment 上的 Seldon - Iter8 实验

第一个选项是使用更新的 Seldon Deployment 候选模型创建 AB 实验,运行 Iter8 实验,并以一组指标标准逐步推出候选模型。架构如下:

seldonIter8Single

我们首先更新默认模型以启动 AB 测试,如下所示:

apiVersion: v1
kind: Namespace
metadata:
    name: ns-production
---
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: iris
  namespace: ns-production
spec:
  predictors:
  - name: baseline
    traffic: 100
    graph:
      name: classifier
      modelUri: gs://seldon-models/v1.10.0-dev/sklearn/iris
      implementation: SKLEARN_SERVER
  - name: candidate
    traffic: 0
    graph:
      name: classifier
      modelUri: gs://seldon-models/xgboost/iris
      implementation: XGBOOST_SERVER

这里我们有孵化的 SKLearn 模型和一个候选的 XGBoost 模型来替代它,当前流量为 0。

接下来,我们告诉 Iter8 它可用于 Iter8 指标自定义资源的指标。

apiVersion: v1
kind: Namespace
metadata:
  name: iter8-seldon
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
  name: 95th-percentile-tail-latency
  namespace: iter8-seldon
spec:
  description: 95th percentile tail latency
  jqExpression: .data.result[0].value[1] | tonumber
  params:
  - name: query
    value: |
      histogram_quantile(0.95, sum(rate(seldon_api_executor_client_requests_seconds_bucket{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) by (le))
  provider: prometheus
  sampleSize: iter8-seldon/request-count
  type: Gauge
  units: milliseconds
  urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
  name: error-count
  namespace: iter8-seldon
spec:
  description: Number of error responses
  jqExpression: .data.result[0].value[1] | tonumber
  params:
  - name: query
    value: |
      sum(increase(seldon_api_executor_server_requests_seconds_count{code!='200',seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)
  provider: prometheus
  type: Counter
  urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
  name: error-rate
  namespace: iter8-seldon
spec:
  description: Fraction of requests with error responses
  jqExpression: .data.result[0].value[1] | tonumber
  params:
  - name: query
    value: |
      (sum(increase(seldon_api_executor_server_requests_seconds_count{code!='200',seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)) / (sum(increase(seldon_api_executor_server_requests_seconds_count{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0))
  provider: prometheus
  sampleSize: iter8-seldon/request-count
  type: Gauge
  urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
  name: mean-latency
  namespace: iter8-seldon
spec:
  description: Mean latency
  jqExpression: .data.result[0].value[1] | tonumber
  params:
  - name: query
    value: |
      (sum(increase(seldon_api_executor_client_requests_seconds_sum{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)) / (sum(increase(seldon_api_executor_client_requests_seconds_count{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0))
  provider: prometheus
  sampleSize: iter8-seldon/request-count
  type: Gauge
  units: milliseconds
  urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
  name: request-count
  namespace: iter8-seldon
spec:
  description: Number of requests
  jqExpression: .data.result[0].value[1] | tonumber
  params:
  - name: query
    value: |
      sum(increase(seldon_api_executor_client_requests_seconds_sum{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)
  provider: prometheus
  type: Counter
  urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
  name: user-engagement
  namespace: iter8-seldon
spec:
  description: Number of feedback requests
  jqExpression: .data.result[0].value[1] | tonumber
  params:
  - name: query
    value: |
      sum(increase(seldon_api_executor_server_requests_seconds_count{service='feedback',seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)
  provider: prometheus
  type: Gauge
  urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query

这创建了一组指标,用于与其相应的 Prometheus 查询语言表达式的实验。这些指标是参数化的,可用于不同的实验。

NAME                           TYPE      DESCRIPTION
95th-percentile-tail-latency   Gauge     95th percentile tail latency
error-count                    Counter   Number of error responses
error-rate                     Gauge     Fraction of requests with error responses
mean-latency                   Gauge     Mean latency
request-count                  Counter   Number of requests
user-engagement                Gauge     Number of feedback requests

然后,这些指标可用于实验定义奖励给对比模型和服务级目标模型来决策出运行成功的目标。

一旦指标定义,就可以按照 Iter8 实验 CRD 标识开始实验:

apiVersion: iter8.tools/v2alpha2
kind: Experiment
metadata:
  name: quickstart-exp
spec:
  target: iris
  strategy:
    testingPattern: A/B
    deploymentPattern: Progressive
    actions:
      # when the experiment completes, promote the winning version using kubectl apply
      finish:
      - task: common/exec
        with:
          cmd: /bin/bash
          args: [ "-c", "kubectl apply -f {{ .promote }}" ]
  criteria:
    requestCount: iter8-seldon/request-count
    rewards: # Business rewards
    - metric: iter8-seldon/user-engagement
      preferredDirection: High # maximize user engagement
    objectives:
    - metric: iter8-seldon/mean-latency
      upperLimit: 2000
    - metric: iter8-seldon/95th-percentile-tail-latency
      upperLimit: 5000
    - metric: iter8-seldon/error-rate
      upperLimit: "0.01"
  duration:
    intervalSeconds: 10
    iterationsPerLoop: 15
  versionInfo:
    # information about model versions used in this experiment
    baseline:
      name: iris-v1
      weightObjRef:
        apiVersion: machinelearning.seldon.io/v1
        kind: SeldonDeployment
        name: iris
        namespace: ns-production
        fieldPath: .spec.predictors[0].traffic
      variables:
      - name: ns
        value: ns-production
      - name: sid
        value: iris
      - name: predictor
        value: baseline
      - name: promote
        value: https://raw.githubusercontent.com/SeldonIO/seldon-core/master/examples/iter8/progressive_rollout/single_sdep/promote-v1.yaml
    candidates:
    - name: iris-v2
      weightObjRef:
        apiVersion: machinelearning.seldon.io/v1
        kind: SeldonDeployment
        name: iris
        namespace: ns-production
        fieldPath: .spec.predictors[1].traffic
      variables:
      - name: ns
        value: ns-production
      - name: sid
        value: iris
      - name: predictor
        value: candidate
      - name: promote
        value: https://raw.githubusercontent.com/SeldonIO/seldon-core/master/examples/iter8/progressive_rollout/single_sdep/promote-v2.yaml

这有几个关键部分:

  • Strategy: 策略:运行的实验类型和完成的操作。

  • Criteria: 标准:奖励和服务目标的关键指标。

  • Duration: 持续时间:运行实验的时间。

  • VersionInfo: 版本信息:要比较的各种候选模型的详细信息。

实验启动后,流量将根据定义的奖励和目标移动到各个候选者。

随着实验的进行,可以使用 iter8 工具 iter8ctl 跟踪状态:

****** Overview ******
Experiment name: quickstart-exp
Experiment namespace: seldon
Target: iris
Testing pattern: A/B
Deployment pattern: Progressive

****** Progress Summary ******
Experiment stage: Running
Number of completed iterations: 6

****** Winner Assessment ******
App versions in this experiment: [iris-v1 iris-v2]
Winning version: iris-v2
Version recommended for promotion: iris-v2

****** Objective Assessment ******
> Identifies whether or not the experiment objectives are satisfied by the most recently observed metrics values for each version.
+-------------------------------------------+---------+---------+
|                 OBJECTIVE                 | IRIS-V1 | IRIS-V2 |
+-------------------------------------------+---------+---------+
| iter8-seldon/mean-latency <=              | true    | true    |
|                                  2000.000 |         |         |
+-------------------------------------------+---------+---------+
| iter8-seldon/95th-percentile-tail-latency | true    | true    |
| <= 5000.000                               |         |         |
+-------------------------------------------+---------+---------+
| iter8-seldon/error-rate <=                | true    | true    |
|                                     0.010 |         |         |
+-------------------------------------------+---------+---------+

****** Metrics Assessment ******
> Most recently read values of experiment metrics for each version.
+-------------------------------------------+---------+---------+
|                  METRIC                   | IRIS-V1 | IRIS-V2 |
+-------------------------------------------+---------+---------+
| iter8-seldon/request-count                |   5.256 |   1.655 |
+-------------------------------------------+---------+---------+
| iter8-seldon/user-engagement              |  49.867 |  68.240 |
+-------------------------------------------+---------+---------+
| iter8-seldon/mean-latency                 |   0.016 |   0.016 |
| (milliseconds)                            |         |         |
+-------------------------------------------+---------+---------+
| iter8-seldon/95th-percentile-tail-latency |   0.025 |   0.045 |
| (milliseconds)                            |         |         |
+-------------------------------------------+---------+---------+
| iter8-seldon/error-rate                   |   0.000 |   0.000 |
+-------------------------------------------+---------+---------+

我们也可以通过 kubectl 检查实验的状态:

kubectl get experiment
NAME             TYPE   TARGET   STAGE       COMPLETED ITERATIONS   MESSAGE
quickstart-exp   A/B    iris     Completed   15                     ExperimentCompleted: Experiment Completed

在上述示例中,为成功候选版本定义了最后操作行为,来更新新的默认的 Seldon deployment。

下一步 运行示例 notebook

运行特定 Seldon Deployments 的 Seldon/Iter8 实验

我们还可以在特定的部署上运行实验。这需要通过在创建服务网格是选择路由流量规则可被 Iter8 修改并推送流量到各个 Seldon Deployment。

此类实验的架构如下所示:

seldonIter8Separate

不同之处在与我们有两个 Seldon Deployments。一个基线为:

apiVersion: v1
kind: Namespace
metadata:
    name: ns-baseline
---
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: iris
  namespace: ns-baseline
spec:
  predictors:
  - name: default
    graph:
      name: classifier
      modelUri: gs://seldon-models/v1.10.0-dev/sklearn/iris
      implementation: SKLEARN_SERVER

一个候选模型:

apiVersion: v1
kind: Namespace
metadata:
    name: ns-candidate
---
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: iris
  namespace: ns-candidate
spec:
  predictors:
  - name: default
    graph:
      name: classifier
      modelUri: gs://seldon-models/xgboost/iris
      implementation: XGBOOST_SERVER

然后,我们需要 Istio 定义路由规则来为上面两个划分流量:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: routing-rule
  namespace: default
spec:
  gateways:
  - istio-system/seldon-gateway
  hosts:
  - iris.example.com
  http:
  - route:
    - destination:
        host: iris-default.ns-baseline.svc.cluster.local
        port:
          number: 8000
      headers:
        response:
          set:
            version: iris-v1
      weight: 100
    - destination:
        host: iris-default.ns-candidate.svc.cluster.local
        port:
          number: 8000
      headers:
        response:
          set:
            version: iris-v2
      weight: 0

指标定义同上面的节点相同。与上面实验非常相似,但具有不同的 VersionInfo 指向到 Istio VirtualService 来修改切换流量:

apiVersion: iter8.tools/v2alpha2
kind: Experiment
metadata:
  name: quickstart-exp
spec:
  target: iris
  strategy:
    testingPattern: A/B
    deploymentPattern: Progressive
    actions:
      # when the experiment completes, promote the winning version using kubectl apply
      finish:
      - task: common/exec
        with:
          cmd: /bin/bash
          args: [ "-c", "kubectl apply -f {{ .promote }}" ]
  criteria:
    requestCount: iter8-seldon/request-count
    rewards: # Business rewards
    - metric: iter8-seldon/user-engagement
      preferredDirection: High # maximize user engagement
    objectives:
    - metric: iter8-seldon/mean-latency
      upperLimit: 2000
    - metric: iter8-seldon/95th-percentile-tail-latency
      upperLimit: 5000
    - metric: iter8-seldon/error-rate
      upperLimit: "0.01"
  duration:
    intervalSeconds: 10
    iterationsPerLoop: 10
  versionInfo:
    # information about model versions used in this experiment
    baseline:
      name: iris-v1
      weightObjRef:
        apiVersion: networking.istio.io/v1alpha3
        kind: VirtualService
        name: routing-rule
        namespace: default
        fieldPath: .spec.http[0].route[0].weight
      variables:
      - name: ns
        value: ns-baseline
      - name: sid
        value: iris
      - name: predictor
        value: default
      - name: promote
        value: https://raw.githubusercontent.com/SeldonIO/seldon-core/master/examples/iter8/progressive_rollout/separate_sdeps/promote-v1.yaml
    candidates:
    - name: iris-v2
      weightObjRef:
        apiVersion: networking.istio.io/v1alpha3
        kind: VirtualService
        name: routing-rule
        namespace: default
        fieldPath: .spec.http[0].route[1].weight
      variables:
      - name: ns
        value: ns-candidate
      - name: sid
        value: iris
      - name: predictor
        value: default
      - name: promote
        value: https://raw.githubusercontent.com/SeldonIO/seldon-core/master/examples/iter8/progressive_rollout/separate_sdeps/promote-v2.yaml

实验的进展与在这种情况下,将最佳模型推广到现有默认基线上相似。

下一步骤 运行示例 notebook