Skip to content

Canary analysis metrics templating #418

@stefanprodan

Description

@stefanprodan

Using custom metrics for canary analysis requires the query to be inlined in the Canary spec.
To allow these queries to be parameterised, shared across namespaces and canary objects,
the Canary spec could contain a reference to a metric template that defines the metrics provider and the query.

Metric template specifications

Kubernetes custom resource:

type MetricTemplateSpec struct {
    // Provider of this metric
    Provider MetricTemplateProvider `json:"provider,omitempty"`

    // Query template of this metric
    Query string `json:"query,omitempty"`
}

type MetricTemplateProvider struct {
    // Type of provider
    Type string `json:"type,omitempty"`

    // Address of this provider API
    Address string `json:"address,omitempty"`

    // Secret reference containing the provider credentials
    // +optional
    SecretRef *corev1.LocalObjectReference `json:"secretRef,omitempty"`
}

The provider type could be prometheus, influxdb, datadog, wavefront, etc.
Depending on the provider, the secret could contain basic-auth credentials, org/token, an API key or a TLS cert.

Prometheus example:

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: latency
spec:
  provider:
    type: prometheus
    address: http://linkerd-prometheus.linkerd:9090
  query: |
    histogram_quantile(
        0.99,
        sum(
            rate(
                response_latency_ms_bucket{
                    namespace="{{ namespace }}",
                    deployment=~"{{ target }}",
                    direction="inbound"
                    }[{{ interval }}]
                )
            ) by (le)
        )

The following variables are available in templates:

  • name (canary.metadata.name)
  • namespace (canary.metadata.namespace)
  • target (canary.spec.targetRef.name)
  • service (canary.spec.service.name)
  • ingress (canary.spec.ingresRef.name)
  • interval (canary.spec.canaryAnalysis.metrics[].interval)

The query is rendered with go text/template.

InfluxDB example:

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: max-db-con
  namespace: flagger
spec:
  provider:
    type: influxdb
    address: http://influx.db:8086
    # secret must contain two keys: org and token
    secretRef:
      name: influx-auth
  # influxdb v2 query (flux lang)
  query: |
    from(bucket: "connections")
      |> filter(fn: (r) => r._client =~ /{{ target }}-{{ namespace }}-.*/)
      |> range(start: -{{ interval }})
      |> count()

Canary analysis specifications

A canary analysis metric can reference a template with templateRef:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: dev
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  canaryAnalysis:
    metrics:
      - name: "max database connections"
        templateRef:
          name: max-db-con
          # namespace is optional
          # when not specified, the canary namespace will be used
          namespace: flagger
        threshold: 100
        interval: 1m

When a metric has a template reference, Flagger will render the query and using the provider's client will execute that query.
The query runner takes the first result and converts it to type float64.
If the result is below the threshold, the metric check passes, otherwise the check fails and the rollout advancement is paused or stopped.

Metric provider specifications

To add a provider one should implement the provider interface:

type Interface interface {
	// RunQuery executes the query and converts the first result to float64
	RunQuery(query string) (float64, error)

	// IsOnline calls the provider endpoint and returns an error if the API is unreachable
	IsOnline() (bool, error)
}

An implementation will be available for Prometheus and it will replace the current metrics client.

Prometheus provider example:

// PrometheusProvider executes promQL queries against the Prometheus API
// using basic authentication if username and password are supplied 
type PrometheusProvider struct {
	timeout  time.Duration
	url      url.URL
	username string
	password string
}

// NewPrometheusProvider takes a provider spec and the credentials map,
// validates the HTTP(S) address, extracts the username and password values if provided and
// returns a Prometheus client ready to execute queries against the HTTP API
func NewPrometheusProvider(provider flaggerv1.MetricTemplateProvider, credentials map[string][]byte) (*PrometheusProvider, error) { }

// RunQuery executes the promQL query and returns the first result as float64
func (p *PrometheusProvider) RunQuery(query string) (float64, error) { }

// IsOnline calls the Prometheus status endpoint and returns an error if the API is unreachable
func (p *PrometheusProvider) IsOnline() (bool, error) { }

Ref: #241 #283 #284

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions