AB 测试和渐进式发布¶
简单的 AB 测试¶
Seldon Core 提供了使用 Istio 和 Ambassador 按照要求进行流量划分,轻松创建 AB 测试和影子流量部署的能力。
使用 Seldon Analytics 面板可在 AB 测试中使用 prometheus 为不同的预估器分析计算指标。
高级 AB 测试实验以及渐进式发布¶
更高级的用例我们推荐使用 Iter8 进行集成,它提供了明确的候选实验目标以及清晰的候选模型优胜版。Iter8 还提供渐进式发布功能,以自动选择测试候选模型,如果候选模型的性能优于孵化模型,则将其推广到生产环境。
在 Seldon,我们提供了两个关于如何运行 Iter8 实验的当前示例。
在单个 Seldon Deployment 上的 Seldon/Iter8 实验。
在特定 Seldon Deployments 上的 Seldon/Iter8 实验。
在单个 Seldon Deployment 上的 Seldon - Iter8 实验¶
第一个选项是使用更新的 Seldon Deployment 候选模型创建 AB 实验,运行 Iter8 实验,并以一组指标标准逐步推出候选模型。架构如下:
我们首先更新默认模型以启动 AB 测试,如下所示:
apiVersion: v1
kind: Namespace
metadata:
name: ns-production
---
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris
namespace: ns-production
spec:
predictors:
- name: baseline
traffic: 100
graph:
name: classifier
modelUri: gs://seldon-models/v1.10.0-dev/sklearn/iris
implementation: SKLEARN_SERVER
- name: candidate
traffic: 0
graph:
name: classifier
modelUri: gs://seldon-models/xgboost/iris
implementation: XGBOOST_SERVER
这里我们有孵化的 SKLearn 模型和一个候选的 XGBoost 模型来替代它,当前流量为 0。
接下来,我们告诉 Iter8 它可用于 Iter8 指标自定义资源的指标。
apiVersion: v1
kind: Namespace
metadata:
name: iter8-seldon
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
name: 95th-percentile-tail-latency
namespace: iter8-seldon
spec:
description: 95th percentile tail latency
jqExpression: .data.result[0].value[1] | tonumber
params:
- name: query
value: |
histogram_quantile(0.95, sum(rate(seldon_api_executor_client_requests_seconds_bucket{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) by (le))
provider: prometheus
sampleSize: iter8-seldon/request-count
type: Gauge
units: milliseconds
urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
name: error-count
namespace: iter8-seldon
spec:
description: Number of error responses
jqExpression: .data.result[0].value[1] | tonumber
params:
- name: query
value: |
sum(increase(seldon_api_executor_server_requests_seconds_count{code!='200',seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)
provider: prometheus
type: Counter
urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
name: error-rate
namespace: iter8-seldon
spec:
description: Fraction of requests with error responses
jqExpression: .data.result[0].value[1] | tonumber
params:
- name: query
value: |
(sum(increase(seldon_api_executor_server_requests_seconds_count{code!='200',seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)) / (sum(increase(seldon_api_executor_server_requests_seconds_count{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0))
provider: prometheus
sampleSize: iter8-seldon/request-count
type: Gauge
urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
name: mean-latency
namespace: iter8-seldon
spec:
description: Mean latency
jqExpression: .data.result[0].value[1] | tonumber
params:
- name: query
value: |
(sum(increase(seldon_api_executor_client_requests_seconds_sum{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)) / (sum(increase(seldon_api_executor_client_requests_seconds_count{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0))
provider: prometheus
sampleSize: iter8-seldon/request-count
type: Gauge
units: milliseconds
urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
name: request-count
namespace: iter8-seldon
spec:
description: Number of requests
jqExpression: .data.result[0].value[1] | tonumber
params:
- name: query
value: |
sum(increase(seldon_api_executor_client_requests_seconds_sum{seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)
provider: prometheus
type: Counter
urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
---
apiVersion: iter8.tools/v2alpha2
kind: Metric
metadata:
name: user-engagement
namespace: iter8-seldon
spec:
description: Number of feedback requests
jqExpression: .data.result[0].value[1] | tonumber
params:
- name: query
value: |
sum(increase(seldon_api_executor_server_requests_seconds_count{service='feedback',seldon_deployment_id='$sid',predictor_name='$predictor',kubernetes_namespace='$ns'}[${elapsedTime}s])) or on() vector(0)
provider: prometheus
type: Gauge
urlTemplate: http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/query
这创建了一组指标,用于与其相应的 Prometheus 查询语言表达式的实验。这些指标是参数化的,可用于不同的实验。
NAME TYPE DESCRIPTION
95th-percentile-tail-latency Gauge 95th percentile tail latency
error-count Counter Number of error responses
error-rate Gauge Fraction of requests with error responses
mean-latency Gauge Mean latency
request-count Counter Number of requests
user-engagement Gauge Number of feedback requests
然后,这些指标可用于实验定义奖励给对比模型和服务级目标模型来决策出运行成功的目标。
一旦指标定义,就可以按照 Iter8 实验 CRD 标识开始实验:
apiVersion: iter8.tools/v2alpha2
kind: Experiment
metadata:
name: quickstart-exp
spec:
target: iris
strategy:
testingPattern: A/B
deploymentPattern: Progressive
actions:
# when the experiment completes, promote the winning version using kubectl apply
finish:
- task: common/exec
with:
cmd: /bin/bash
args: [ "-c", "kubectl apply -f {{ .promote }}" ]
criteria:
requestCount: iter8-seldon/request-count
rewards: # Business rewards
- metric: iter8-seldon/user-engagement
preferredDirection: High # maximize user engagement
objectives:
- metric: iter8-seldon/mean-latency
upperLimit: 2000
- metric: iter8-seldon/95th-percentile-tail-latency
upperLimit: 5000
- metric: iter8-seldon/error-rate
upperLimit: "0.01"
duration:
intervalSeconds: 10
iterationsPerLoop: 15
versionInfo:
# information about model versions used in this experiment
baseline:
name: iris-v1
weightObjRef:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
name: iris
namespace: ns-production
fieldPath: .spec.predictors[0].traffic
variables:
- name: ns
value: ns-production
- name: sid
value: iris
- name: predictor
value: baseline
- name: promote
value: https://raw.githubusercontent.com/SeldonIO/seldon-core/master/examples/iter8/progressive_rollout/single_sdep/promote-v1.yaml
candidates:
- name: iris-v2
weightObjRef:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
name: iris
namespace: ns-production
fieldPath: .spec.predictors[1].traffic
variables:
- name: ns
value: ns-production
- name: sid
value: iris
- name: predictor
value: candidate
- name: promote
value: https://raw.githubusercontent.com/SeldonIO/seldon-core/master/examples/iter8/progressive_rollout/single_sdep/promote-v2.yaml
这有几个关键部分:
Strategy: 策略:运行的实验类型和完成的操作。
Criteria: 标准:奖励和服务目标的关键指标。
Duration: 持续时间:运行实验的时间。
VersionInfo: 版本信息:要比较的各种候选模型的详细信息。
实验启动后,流量将根据定义的奖励和目标移动到各个候选者。
随着实验的进行,可以使用 iter8 工具 iter8ctl 跟踪状态:
****** Overview ******
Experiment name: quickstart-exp
Experiment namespace: seldon
Target: iris
Testing pattern: A/B
Deployment pattern: Progressive
****** Progress Summary ******
Experiment stage: Running
Number of completed iterations: 6
****** Winner Assessment ******
App versions in this experiment: [iris-v1 iris-v2]
Winning version: iris-v2
Version recommended for promotion: iris-v2
****** Objective Assessment ******
> Identifies whether or not the experiment objectives are satisfied by the most recently observed metrics values for each version.
+-------------------------------------------+---------+---------+
| OBJECTIVE | IRIS-V1 | IRIS-V2 |
+-------------------------------------------+---------+---------+
| iter8-seldon/mean-latency <= | true | true |
| 2000.000 | | |
+-------------------------------------------+---------+---------+
| iter8-seldon/95th-percentile-tail-latency | true | true |
| <= 5000.000 | | |
+-------------------------------------------+---------+---------+
| iter8-seldon/error-rate <= | true | true |
| 0.010 | | |
+-------------------------------------------+---------+---------+
****** Metrics Assessment ******
> Most recently read values of experiment metrics for each version.
+-------------------------------------------+---------+---------+
| METRIC | IRIS-V1 | IRIS-V2 |
+-------------------------------------------+---------+---------+
| iter8-seldon/request-count | 5.256 | 1.655 |
+-------------------------------------------+---------+---------+
| iter8-seldon/user-engagement | 49.867 | 68.240 |
+-------------------------------------------+---------+---------+
| iter8-seldon/mean-latency | 0.016 | 0.016 |
| (milliseconds) | | |
+-------------------------------------------+---------+---------+
| iter8-seldon/95th-percentile-tail-latency | 0.025 | 0.045 |
| (milliseconds) | | |
+-------------------------------------------+---------+---------+
| iter8-seldon/error-rate | 0.000 | 0.000 |
+-------------------------------------------+---------+---------+
我们也可以通过 kubectl 检查实验的状态:
kubectl get experiment
NAME TYPE TARGET STAGE COMPLETED ITERATIONS MESSAGE
quickstart-exp A/B iris Completed 15 ExperimentCompleted: Experiment Completed
在上述示例中,为成功候选版本定义了最后操作行为,来更新新的默认的 Seldon deployment。
下一步 运行示例 notebook。
运行特定 Seldon Deployments 的 Seldon/Iter8 实验¶
我们还可以在特定的部署上运行实验。这需要通过在创建服务网格是选择路由流量规则可被 Iter8 修改并推送流量到各个 Seldon Deployment。
此类实验的架构如下所示:
不同之处在与我们有两个 Seldon Deployments。一个基线为:
apiVersion: v1
kind: Namespace
metadata:
name: ns-baseline
---
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris
namespace: ns-baseline
spec:
predictors:
- name: default
graph:
name: classifier
modelUri: gs://seldon-models/v1.10.0-dev/sklearn/iris
implementation: SKLEARN_SERVER
一个候选模型:
apiVersion: v1
kind: Namespace
metadata:
name: ns-candidate
---
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris
namespace: ns-candidate
spec:
predictors:
- name: default
graph:
name: classifier
modelUri: gs://seldon-models/xgboost/iris
implementation: XGBOOST_SERVER
然后,我们需要 Istio 定义路由规则来为上面两个划分流量:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: routing-rule
namespace: default
spec:
gateways:
- istio-system/seldon-gateway
hosts:
- iris.example.com
http:
- route:
- destination:
host: iris-default.ns-baseline.svc.cluster.local
port:
number: 8000
headers:
response:
set:
version: iris-v1
weight: 100
- destination:
host: iris-default.ns-candidate.svc.cluster.local
port:
number: 8000
headers:
response:
set:
version: iris-v2
weight: 0
指标定义同上面的节点相同。与上面实验非常相似,但具有不同的 VersionInfo 指向到 Istio VirtualService 来修改切换流量:
apiVersion: iter8.tools/v2alpha2
kind: Experiment
metadata:
name: quickstart-exp
spec:
target: iris
strategy:
testingPattern: A/B
deploymentPattern: Progressive
actions:
# when the experiment completes, promote the winning version using kubectl apply
finish:
- task: common/exec
with:
cmd: /bin/bash
args: [ "-c", "kubectl apply -f {{ .promote }}" ]
criteria:
requestCount: iter8-seldon/request-count
rewards: # Business rewards
- metric: iter8-seldon/user-engagement
preferredDirection: High # maximize user engagement
objectives:
- metric: iter8-seldon/mean-latency
upperLimit: 2000
- metric: iter8-seldon/95th-percentile-tail-latency
upperLimit: 5000
- metric: iter8-seldon/error-rate
upperLimit: "0.01"
duration:
intervalSeconds: 10
iterationsPerLoop: 10
versionInfo:
# information about model versions used in this experiment
baseline:
name: iris-v1
weightObjRef:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
name: routing-rule
namespace: default
fieldPath: .spec.http[0].route[0].weight
variables:
- name: ns
value: ns-baseline
- name: sid
value: iris
- name: predictor
value: default
- name: promote
value: https://raw.githubusercontent.com/SeldonIO/seldon-core/master/examples/iter8/progressive_rollout/separate_sdeps/promote-v1.yaml
candidates:
- name: iris-v2
weightObjRef:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
name: routing-rule
namespace: default
fieldPath: .spec.http[0].route[1].weight
variables:
- name: ns
value: ns-candidate
- name: sid
value: iris
- name: predictor
value: default
- name: promote
value: https://raw.githubusercontent.com/SeldonIO/seldon-core/master/examples/iter8/progressive_rollout/separate_sdeps/promote-v2.yaml
实验的进展与在这种情况下,将最佳模型推广到现有默认基线上相似。
下一步骤 运行示例 notebook。