-
Notifications
You must be signed in to change notification settings - Fork 794
Canary analysis metrics templating #418
Description
Using custom metrics for canary analysis requires the query to be inlined in the Canary spec.
To allow these queries to be parameterised, shared across namespaces and canary objects,
the Canary spec could contain a reference to a metric template that defines the metrics provider and the query.
Metric template specifications
Kubernetes custom resource:
type MetricTemplateSpec struct {
// Provider of this metric
Provider MetricTemplateProvider `json:"provider,omitempty"`
// Query template of this metric
Query string `json:"query,omitempty"`
}
type MetricTemplateProvider struct {
// Type of provider
Type string `json:"type,omitempty"`
// Address of this provider API
Address string `json:"address,omitempty"`
// Secret reference containing the provider credentials
// +optional
SecretRef *corev1.LocalObjectReference `json:"secretRef,omitempty"`
}The provider type could be prometheus, influxdb, datadog, wavefront, etc.
Depending on the provider, the secret could contain basic-auth credentials, org/token, an API key or a TLS cert.
Prometheus example:
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: latency
spec:
provider:
type: prometheus
address: http://linkerd-prometheus.linkerd:9090
query: |
histogram_quantile(
0.99,
sum(
rate(
response_latency_ms_bucket{
namespace="{{ namespace }}",
deployment=~"{{ target }}",
direction="inbound"
}[{{ interval }}]
)
) by (le)
)The following variables are available in templates:
name(canary.metadata.name)namespace(canary.metadata.namespace)target(canary.spec.targetRef.name)service(canary.spec.service.name)ingress(canary.spec.ingresRef.name)interval(canary.spec.canaryAnalysis.metrics[].interval)
The query is rendered with go text/template.
InfluxDB example:
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: max-db-con
namespace: flagger
spec:
provider:
type: influxdb
address: http://influx.db:8086
# secret must contain two keys: org and token
secretRef:
name: influx-auth
# influxdb v2 query (flux lang)
query: |
from(bucket: "connections")
|> filter(fn: (r) => r._client =~ /{{ target }}-{{ namespace }}-.*/)
|> range(start: -{{ interval }})
|> count()Canary analysis specifications
A canary analysis metric can reference a template with templateRef:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: podinfo
namespace: dev
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
canaryAnalysis:
metrics:
- name: "max database connections"
templateRef:
name: max-db-con
# namespace is optional
# when not specified, the canary namespace will be used
namespace: flagger
threshold: 100
interval: 1mWhen a metric has a template reference, Flagger will render the query and using the provider's client will execute that query.
The query runner takes the first result and converts it to type float64.
If the result is below the threshold, the metric check passes, otherwise the check fails and the rollout advancement is paused or stopped.
Metric provider specifications
To add a provider one should implement the provider interface:
type Interface interface {
// RunQuery executes the query and converts the first result to float64
RunQuery(query string) (float64, error)
// IsOnline calls the provider endpoint and returns an error if the API is unreachable
IsOnline() (bool, error)
}An implementation will be available for Prometheus and it will replace the current metrics client.
Prometheus provider example:
// PrometheusProvider executes promQL queries against the Prometheus API
// using basic authentication if username and password are supplied
type PrometheusProvider struct {
timeout time.Duration
url url.URL
username string
password string
}
// NewPrometheusProvider takes a provider spec and the credentials map,
// validates the HTTP(S) address, extracts the username and password values if provided and
// returns a Prometheus client ready to execute queries against the HTTP API
func NewPrometheusProvider(provider flaggerv1.MetricTemplateProvider, credentials map[string][]byte) (*PrometheusProvider, error) { }
// RunQuery executes the promQL query and returns the first result as float64
func (p *PrometheusProvider) RunQuery(query string) (float64, error) { }
// IsOnline calls the Prometheus status endpoint and returns an error if the API is unreachable
func (p *PrometheusProvider) IsOnline() (bool, error) { }