feat: Add support for Envoy managed by Crossover#386
feat: Add support for Envoy managed by Crossover#386stefanprodan merged 13 commits intofluxcd:masterfrom
Conversation
4b5c872 to
1f2f3eb
Compare
Assumes `envoy:smi` as the mesh provider name as I've successfully tested the progressive delivery for Envoy + Crossover with it. This enhances Flagger to translate it to the metrics provider name of `envoy` for deployment targets, or `envoy:service` for service targets. The `envoy` metrics provider is equivalent to `appmesh`, as both relies on the same set of standard metrics exposed by Envoy itself. The `envoy:service` is almost the same as the `envoy` provider, but removing the condition on pod name, as we only need to filter on the backing service name = envoy_cluster_name. We don't consider other Envoy xDS implementations that uses anything that is different to original servicen ames as `envoy_cluster_name`, for now. Ref fluxcd#385
1f2f3eb to
a828524
Compare
|
Can you please add a section to the PR and explain how was this tested? The manual testing should cover the tutorial. You could create a Helm repository for crossover with GitHub pages or use the Flagger one by adding the crossover chart to the charts directory. The chart could have a dependency on the Envoy stable one. |
| url: http://flagger-loadtester.test/ | ||
| timeout: 5s | ||
| metadata: | ||
| cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/" |
There was a problem hiding this comment.
This should point to the Envoy's ClusterIP otherwise there will be no metrics data. See NGINX and Gloo examples.
| Generate HTTP 500 errors: | ||
|
|
||
| ```bash | ||
| hey -z 1m -c 5 -q 5 http://podinfo-canary.test:9898/status/500 |
There was a problem hiding this comment.
This should point to the Envoy's ClusterIP
| Generate latency: | ||
|
|
||
| ```bash | ||
| watch -n 1 curl http://podinfo-canary.test:9898/delay/1 |
There was a problem hiding this comment.
This should point to the Envoy's ClusterIP
| envoy_cluster_upstream_rq{ | ||
| kubernetes_namespace="{{ .Namespace }}", | ||
| kubernetes_pod_name=~"{{ .Name }}-[0-9a-zA-Z]+(-[0-9a-zA-Z]+)" | ||
| envoy_cluster_name=~"{{ .Name }}-canary" |
There was a problem hiding this comment.
This breaks App Mesh, the envoy_cluster_name doesn't have this format. Please create a crossover metrics observer.
There was a problem hiding this comment.
I've discussed with stefan about this in Slack and the fix has been pushed.
| url: http://flagger-loadtester.test/ | ||
| timeout: 5s | ||
| metadata: | ||
| cmd: "hey -z 1m -q 10 -c 2 http://envoy.test:10000/" |
There was a problem hiding this comment.
I see no host header in here, did you test it with multiple TrafficSplit objects? I don't see how this will hit the right service
To not break AppMesh integration.
|
Updated the pull request description to match the current state and the scope of this. |
stefanprodan
left a comment
There was a problem hiding this comment.
LGTM
Awesome work! Thanks @mumoshu
This is the first set of changes to address #385, including:
How this was tested
I've literally run every step described in the guide almost as-is. The only difference is that I've used my own flagger built from this branch with:
With docker builds:
This is how you can see traffic actually shifting from canary to primary in the dashboard:
Future works
My plan is to submit a separate pull request for finishing AppMesh with Service target kind with some documentation, as it seemed to take much more time for me.
Also, I'm going to improve the experience around installing Envoy with Crossover in a separate pull request. It seems messy now. Probably I'll publish an umbrella chart so that one can install it without git and a huge values.yaml file.Done.