Reduce HAProxy reloads - adds support to use the haproxy dynamic config api#19073
Reduce HAProxy reloads - adds support to use the haproxy dynamic config api#19073openshift-merge-robot merged 9 commits intoopenshift:masterfrom
Conversation
|
@ramr: GitHub didn't allow me to request PR reviews from the following users: fyi. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
glide.yaml
Outdated
| - package: k8s.io/apiserver | ||
| repo: git@github.com:openshift/kubernetes-apiserver | ||
| version: release-1.9.1 | ||
| repo: git@github.com:openshift/kubernetes-apiserver |
There was a problem hiding this comment.
Please undo this reformatting.
There was a problem hiding this comment.
That's all done by glide. I couldn't run glide on my local repo - it doesn't run for me, times out all the time, so @rajatchopra ran it for me and he didn't do anything other than glide get && ./hack/update-deps.sh. Is something that's glide version specific causing this?
There was a problem hiding this comment.
I’ve never had glide time out - are you on an old version? Bad internet?
There was a problem hiding this comment.
glide version is v0.13.1 and its a fast[ish] pipe - 200mb+ consistently (and also tried it on a T3 connection) - so i suspect its not the speed. Plus did try it on 3 different vms as well (all fedora 27 though), so something else. Anyway, will look at it at some other time.
| if err := wrapped.SetUpAt(dir, fsGroup); err != nil { | ||
| return err | ||
| } | ||
| if err := volumeutil.MakeNestedMountpoints(b.volName, dir, b.pod); err != nil { |
There was a problem hiding this comment.
Please fix this, you reverted changes.
There was a problem hiding this comment.
Hmm, maybe because we did the glide get and update-deps.sh a few weeks back - this might be outdated. Will do.
test/end-to-end/router_test.go
Outdated
| fmt.Sprintf("NAMESPACE_LABELS=%s", namespaceLabels), | ||
| fmt.Sprintf("ROUTER_CONFIG_MANAGER=haproxy-manager"), | ||
| fmt.Sprintf("ROUTER_DYNAMIC_SERVER_PREFIX=_test-dynamic"), | ||
| fmt.Sprintf("ROUTER_MAX_DYNAMIC_SERVERS=3"), |
There was a problem hiding this comment.
These should have reasonable defaults. I'd prefer if your testing required only ROUTER_CONFIG_MANAGER unless there's a good reason not to.
There was a problem hiding this comment.
ok - i'll set it to only the defaults.
pkg/router/template/types.go
Outdated
| Initialize(router RouterInterface, certPath string) | ||
|
|
||
| // Register registers an id to be associated with a route. | ||
| Register(id string, route *routeapi.Route, wildcard bool) |
There was a problem hiding this comment.
Why are you passing wildcard here when it should be on the route?
There was a problem hiding this comment.
We already had boolean deciphered based on the policy so I just passed the boolean but yeah we could recompute it in register/add.
|
|
||
| // ConfigManager is used by the router to make configuration changes using | ||
| // the template router's dynamic configuration API (if any). | ||
| type ConfigManager interface { |
There was a problem hiding this comment.
Please specify the threading guarantees this code requires with the router controller here.
| @@ -0,0 +1,209 @@ | |||
| package haproxy | |||
There was a problem hiding this comment.
Give this its own unit tests.
There was a problem hiding this comment.
there's a map_test.go in there. Or did you mean something else?
| } | ||
|
|
||
| // NewCSVConverter returns a new CSVConverter. | ||
| func NewCSVConverter(headers string, out interface{}, fn ByteConverterFunc) CSVConverter { |
There was a problem hiding this comment.
Should be a pointer unless you have a good reason not to.
There was a problem hiding this comment.
Not really - ok will change.
|
|
||
| // Convert runs a haproxy dynamic config API command. | ||
| func (c CSVConverter) Convert(data []byte) ([]byte, error) { | ||
| glog.V(4).Infof("CSV converter input data bytes: %s", string(data)) |
There was a problem hiding this comment.
How much data is this? If it's going to regularly be more than a few hundred bytes it should be at v5.
There was a problem hiding this comment.
yeah it might be more than a few hundred - will set to v5.
| if c.converterFunc != nil { | ||
| convertedBytes, err := c.converterFunc(data) | ||
| if err != nil { | ||
| glog.Errorf("CSV converter error: %v", err) |
There was a problem hiding this comment.
Don't log errors like this, wrap or log when you eat it with utilruntime.HandleError
There was a problem hiding this comment.
I think this was a remnant from the logs I was generating during dev - will remove this as its not needed - error are anyways propagated up and handled by the caller.
| // fixupHeaders fixes up haproxy API responses that don't contain any CSV | ||
| // header information. This allows us to easily parse the data and marshal | ||
| // into an array of native golang structs. | ||
| func (c CSVConverter) fixupHeaders(data []byte) ([]byte, error) { |
There was a problem hiding this comment.
don't make this a method, just a function that takes headers and data.
Give this file unit tests.
| } | ||
|
|
||
| // isRetryable checks if a haproxy command can be retried. | ||
| func isRetryable(err error, cmd string) bool { |
| // Backends returns the list of configured haproxy backends. | ||
| func (c *Client) Backends() ([]*Backend, error) { | ||
| if len(c.backends) == 0 { | ||
| if backends, err := GetHAProxyBackends(c); err != nil { |
There was a problem hiding this comment.
If these methods don't need to be public (GetHAProxyBackends) they shouldn't be public.
| // if the error for the command is a retryable error. | ||
| func (c *Client) runCommandWithRetries(cmd string, limit int) (*bytes.Buffer, error) { | ||
| retryAttempt := 0 | ||
| for { |
There was a problem hiding this comment.
use util.ExponentialBackoff or util.PollImmediate instead of inventing your own code for it.
pkg/oc/admin/router/router.go
Outdated
| env["ROUTER_METRICS_TLS_CERT_FILE"] = "/etc/pki/tls/metrics/tls.crt" | ||
| env["ROUTER_METRICS_TLS_KEY_FILE"] = "/etc/pki/tls/metrics/tls.key" | ||
| if cfg.Type == "haproxy-router" { | ||
| env["ROUTER_CONFIG_MANAGER"] = "haproxy-manager" |
There was a problem hiding this comment.
If everyone is going to get these just make these the new default on the command. Don't force people to change their config and make a pointless choice. Otherwise you'll have to add this to the upgrade scripts.
There was a problem hiding this comment.
hmm, so because the template router is also used for nginx, this needs to be either set here since its haproxy specific or we need to pass down the type of router to the infra command (that's not done today). I can either pass down this new setting or then the type and set the defaults based on that in the infra command ... but in either case it still means upgrading existing haproxy-router deployments. So which do you prefer?
Edited
There was a problem hiding this comment.
I think if the config manager works, there’s no reason we wouldn’t have it on by default for haproxy. However, we will likely default it off at first until we’re comfortable that we’ve caught all the edge cases and then default it on. So ideally we just have a Boolean that has an internal default and if the user specifies it we obey their rule. That avoids end users ending up with the flag in their config.
| // blueprintRoutes returns all the routes in the blueprint namespace. | ||
| func (o *TemplateRouterOptions) blueprintRoutes(routeclient *routeinternalclientset.Clientset) ([]*routeapi.Route, error) { | ||
| blueprints := make([]*routeapi.Route, 0) | ||
| if len(o.BlueprintRouteNamespace) == 0 { |
There was a problem hiding this comment.
I don't like an entire namespace of route blueprints. Require them to be identified with a label or associated metadata within a specific namespace, or require them to be specified by name directly.
Whole namespaces devoted to this is pretty ugly, although the idea of pulling it from the route is good.
|
|
||
| var cfgManager templateplugin.ConfigManager | ||
| if o.ConfigManagerName == "haproxy-manager" { | ||
| blueprintRoutes, err := o.blueprintRoutes(routeclient) |
There was a problem hiding this comment.
You're going to have to refresh these, they shouldn't be static or require restart. Make that dynamic (add a controller plugin wrapper that triggers reload of the blueprints).
There was a problem hiding this comment.
There's going to be some issues with blueprint refresh that I need to think about - basically vis-a-vis replacement and deletion and how the allocations/config changes, so can we do this in a follow-up PR?
pkg/cmd/infra/router/template.go
Outdated
| flag.DurationVar(&o.CommitInterval, "commit-interval", getIntervalFromEnv("COMMIT_INTERVAL", defaultCommitInterval), "Controls how often to commit (to the actual config) all the changes made using the router specific dynamic configuration manager.") | ||
| flag.StringVar(&o.BlueprintRouteNamespace, "blueprint-route-namespace", util.Env("ROUTER_BLUEPRINT_ROUTE_NAMESPACE", ""), "Specifies the namespace which contains the routes that serve as blueprints for the dynamic configuration manager.") | ||
| flag.IntVar(&o.BlueprintRoutePoolSize, "blueprint-route-pool-size", int(util.EnvInt("ROUTER_BLUEPRINT_ROUTE_POOL_SIZE", 10, 1)), "Specifies the size of the pre-allocated pool for each route blueprint managed by the router specific dynamic configuration manager. This can be overriden by an annotation router.openshift.io/pool-size on an individual route.") | ||
| flag.StringVar(&o.DynamicServerPrefix, "dynamic-server-prefix", util.Env("ROUTER_DYNAMIC_SERVER_PREFIX", ""), "Specifies the prefix for dynamic servers added to router backends. These dynamic servers are handled by the router specific dynamic configuration manager.") |
There was a problem hiding this comment.
I don't see why this is configurable. Seems pointless?
pkg/cmd/infra/router/template.go
Outdated
| flag.StringVar(&o.Ciphers, "ciphers", util.Env("ROUTER_CIPHERS", ""), "Specifies the cipher suites to use. You can choose a predefined cipher set ('modern', 'intermediate', or 'old') or specify exact cipher suites by passing a : separated list.") | ||
| flag.BoolVar(&o.StrictSNI, "strict-sni", isTrue(util.Env("ROUTER_STRICT_SNI", "")), "Use strict-sni bind processing (do not use default cert).") | ||
| flag.StringVar(&o.MetricsType, "metrics-type", util.Env("ROUTER_METRICS_TYPE", ""), "Specifies the type of metrics to gather. Supports 'haproxy'.") | ||
| flag.StringVar(&o.ConfigManagerName, "config-manager", util.Env("ROUTER_CONFIG_MANAGER", ""), "Specifies the manager to use for dynamically configuring changes with the underlying router. Supports 'haproxy-manager'.") |
There was a problem hiding this comment.
I don't see a reason not to default this on.
There was a problem hiding this comment.
So we don't pass the router type to the infra command, so this needs to be set from above as of now. We can't set this for nginx template routers. We can either pass the type of router down (ala haproxy etc) and then do the required defaults here or then (as was done) pass this env variable down.
There was a problem hiding this comment.
Do the defaulting later when you know the type. I doubt we’ll ever have multiple, so this is really just a bool.
pkg/cmd/infra/router/template.go
Outdated
| flag.BoolVar(&o.StrictSNI, "strict-sni", isTrue(util.Env("ROUTER_STRICT_SNI", "")), "Use strict-sni bind processing (do not use default cert).") | ||
| flag.StringVar(&o.MetricsType, "metrics-type", util.Env("ROUTER_METRICS_TYPE", ""), "Specifies the type of metrics to gather. Supports 'haproxy'.") | ||
| flag.StringVar(&o.ConfigManagerName, "config-manager", util.Env("ROUTER_CONFIG_MANAGER", ""), "Specifies the manager to use for dynamically configuring changes with the underlying router. Supports 'haproxy-manager'.") | ||
| flag.StringVar(&o.ConfigManagerConnectionInfo, "config-manager-connection-info", "", "Specifies connection information for the dynamic configuration manager.") |
There was a problem hiding this comment.
Seems overkill to configure this. We don't allow configuring the stats socket.
There was a problem hiding this comment.
Well the stats socket is sorta tied into how the template is configured (using af unix sockets).
It probably is overkill but if the haproxy template is potentially changed by the user, this is just a knob to use tcp if so needed.
There was a problem hiding this comment.
I'll remove this - we can add it later if needed.
| flag.StringVar(&o.ConfigManagerConnectionInfo, "config-manager-connection-info", "", "Specifies connection information for the dynamic configuration manager.") | ||
| flag.DurationVar(&o.CommitInterval, "commit-interval", getIntervalFromEnv("COMMIT_INTERVAL", defaultCommitInterval), "Controls how often to commit (to the actual config) all the changes made using the router specific dynamic configuration manager.") | ||
| flag.StringVar(&o.BlueprintRouteNamespace, "blueprint-route-namespace", util.Env("ROUTER_BLUEPRINT_ROUTE_NAMESPACE", ""), "Specifies the namespace which contains the routes that serve as blueprints for the dynamic configuration manager.") | ||
| flag.IntVar(&o.BlueprintRoutePoolSize, "blueprint-route-pool-size", int(util.EnvInt("ROUTER_BLUEPRINT_ROUTE_POOL_SIZE", 10, 1)), "Specifies the size of the pre-allocated pool for each route blueprint managed by the router specific dynamic configuration manager. This can be overriden by an annotation router.openshift.io/pool-size on an individual route.") |
There was a problem hiding this comment.
on an individual blueprint route
You need to explain blueprint routes in the command help for this.
| flag.StringVar(&o.MetricsType, "metrics-type", util.Env("ROUTER_METRICS_TYPE", ""), "Specifies the type of metrics to gather. Supports 'haproxy'.") | ||
| flag.StringVar(&o.ConfigManagerName, "config-manager", util.Env("ROUTER_CONFIG_MANAGER", ""), "Specifies the manager to use for dynamically configuring changes with the underlying router. Supports 'haproxy-manager'.") | ||
| flag.StringVar(&o.ConfigManagerConnectionInfo, "config-manager-connection-info", "", "Specifies connection information for the dynamic configuration manager.") | ||
| flag.DurationVar(&o.CommitInterval, "commit-interval", getIntervalFromEnv("COMMIT_INTERVAL", defaultCommitInterval), "Controls how often to commit (to the actual config) all the changes made using the router specific dynamic configuration manager.") |
There was a problem hiding this comment.
Why is this configurable?
There was a problem hiding this comment.
I would tie this to the router reload interval unless you have some sort of real reason to make this configurable.
There was a problem hiding this comment.
The reason is a separate config option is because we don't want to change the reload interval to something higher. As that means if the dynamic config manager can't bring routes online (because of certs/custom annotations etc), the reloads to bring those route would be staggered by the reload interval - not good if say that's 1 hour ... which the default for the commit interval.
| flag.StringVar(&o.TemplateFile, "template", util.Env("TEMPLATE_FILE", ""), "The path to the template file to use") | ||
| flag.StringVar(&o.ReloadScript, "reload", util.Env("RELOAD_SCRIPT", ""), "The path to the reload script to use") | ||
| flag.DurationVar(&o.ReloadInterval, "interval", reloadInterval(), "Controls how often router reloads are invoked. Mutiple router reload requests are coalesced for the duration of this interval since the last reload time.") | ||
| flag.DurationVar(&o.ReloadInterval, "interval", getIntervalFromEnv("RELOAD_INTERVAL", defaultReloadInterval), "Controls how often router reloads are invoked. Mutiple router reload requests are coalesced for the duration of this interval since the last reload time.") |
There was a problem hiding this comment.
why did this change?
There was a problem hiding this comment.
I just took the reloadInterval code and made it into a reusable function so that we can call it to get the interval value for both reload and commit intervals (same code passes a different parameters).
|
|
||
| "genSubdomainWildcardRegexp": genSubdomainWildcardRegexp, //generates a regular expression matching the subdomain for hosts (and paths) with a wildcard policy | ||
| "generateRouteRegexp": generateRouteRegexp, //generates a regular expression matching the route hosts (and paths) | ||
| "generateRouteRegexp": GenerateRouteRegexp, //generates a regular expression matching the route hosts (and paths) |
There was a problem hiding this comment.
Why is this exported?
There was a problem hiding this comment.
If you are trying to reuse this, put it in its own package (templatehelpers maybe) and include it here. That gets more crap out of this package.
pkg/router/template/router.go
Outdated
| glog.V(4).Infof("Dynamically replacing endpoints for associated backend %s", backendKey) | ||
| if err := r.dynamicConfigManager.ReplaceRouteEndpoints(backendKey, oldEndpoints, newEndpoints, weight); err != nil { | ||
| // Error dynamically modifying the config, so return false to cause a reload to happen. | ||
| glog.Warningf("Router will reload as dynamic endpoint replacement for service id %s (backend=%s, weight=%v) failed: %v", id, backendKey, weight, err) |
There was a problem hiding this comment.
You could easily spew errors here, i'm a little concerned with this in the long term.
There was a problem hiding this comment.
Will change to glog.V(4).Infof, so that its available if someone turns up the debugging level and for all the other "Router will reload" messages.
|
|
||
| // If no initial sync was done, don't try to dynamically add the | ||
| // route as we will need a reload anyway. | ||
| if !r.synced { |
There was a problem hiding this comment.
Why is this after registration?
There was a problem hiding this comment.
Because we don't add the routes if the initial sync has not been done. There's nothing in the haproxy config at that point in time (backends don't exist for blueprints as yet, etc). Plus since there's going to be a reload and the config is going to change anyway, the configmanager is not really going to do anything here (other than registration).
So, the registration is done always - irrespective of whether the router sync state. And that ensures that the mappings in the configmanager are correct.
|
/retest |
|
/test integration |
|
/retest |
…eded
when configuration changes. This commit includes:
o Support for a dynamic configuration manager in the template router.
o HAProxy configuration manager implementation.
o Scale up support with a pool of dynamic servers for each backend.
o Dynamic route addition support with a pre-allocated pool of blueprint routes.
o HAProxy dynamic configuration retry support.
o Passthrough routes weights are relative, so as to allow them to be scaled.
Fixes a bug with the haproxy dynamic config api hanging.
o Support for a blueprint route namespace, so that we can load custom
blueprints at startup.
o Refactor, log errors, cleanup and bug fixes.
o Remove unused/extraneous map entries for blueprint routes.
o Self-review changes.
o Add new tests and modify existing integration+e2e tests.
… the manager appropriately. And some more changes as per review comments.
…with passthrough routes.
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, knobunc, ramr The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@ramr: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
Add support to the template router to dynamically configure haproxy. This reduces the reloads
needed when the router configuration changes due to routes/endpoints changing.
Note that HAProxy doesn't have support to dynamically change certificates, so we need to pre-allocate pools of "blueprint" routes to dynamically bring new routes online.
We now add/remove servers from a backend on endpoint changes (for scale up/down, service addition/removal/update, node drain/evictions etc).
Each haproxy backend (for a route) has a set of dynamic servers added named
_dynamic-pod-[1-5]- which are used when endpoints change (example for scale up).Scaling down endpoints just sets the existing servers in a haproxy backend to "maint" or
non-serving state.
The infra router options - command line flags or environment variables named
ROUTER_DYNAMIC_SERVER_PREFIXandROUTER_MAX_DYNAMIC_SERVERS(default: 5)controls these options.
We dynamically add and remove routes if a route matches a pre-allocated/configured blueprint route.
The default config has blueprint routes for insecure/http, edge terminated and passthrough routes.
Re-encrypt routes are an issue as they could have custom certificates so is not currently in that
default blueprint set (commented out).
Route removal just removes the map entries, marks the servers in
maintstate but leaves thebackend config in haproxy intact. The backend is not referenced from anywhere, so is unused.
Route addition checks if it can bring the route online without a router reload - it searches for a
matching blueprint and then a free slot from the pre-configured pool of routes for that blueprint.
If the new route can't be dynamically configured, then the behavior is as today and a reload occurs.
A route might not match a blueprint if it has information that causes a custom haproxy config.
Examples of that could be a route that has custom certificate information or custom CA information
to validate re-encrypted routes or it could even be for custom config that is generated for the
route annotations. Those would fail to find a matching blueprint (and hence cause a reload).
However that said, the template router does also provide the ability to add custom blueprints.
These are basically just routes in a namespace which serve as templates or blueprints - similar
to the default blueprints (for insecure/http, edge terminated and passthrough routes).
So one can easily add a re-encrypt route blueprint if needed.
Also, the add route code would search the custom blueprints (in the same fashion that it does
the default blueprints - custom ones are just added to the list).
The infra router options
ROUTER_BLUEPRINT_ROUTE_NAMESPACEandROUTER_BLUEPRINT_ROUTE_POOL_SIZE(default: 10)provide some knobs for these custom blueprints.
Tests will automatically use this new functionality (extended ones needed a config change) but the
others use the router image, so automatically check with the haproxy router with the dynamic config
manager enabled.
Again, a point to note here is that if we exhaust the pool of dynamic servers or blueprint routes or
don't find a matching blueprint, the template router will need/do a reload.
Associated trello card: https://trello.com/c/W21FD0v0/585-13-implement-dynamic-changes-to-the-router-routerscaledemo
/cc @knobunc @zhaozhanqi fyi