Basic high availability for auto egress IPs by danwinship · Pull Request #19578 · openshift/origin

danwinship · 2018-05-01T15:00:52Z

OK, this changes the semantics of automatic egress IPs so that:

You are now allowed to specify multiple IPs in the EgressIPs field of a NetNamespace, and if so, the first one in the list that is currently hosted on a node will be used.
- However, if any of the IPs in the list are duplicates of IPs used by another namespace, then the namespace will be considered broken (and egress traffic will be dropped). Previously this rule only applied if the first IP was a duplicate, since further IPs were simply ignored.
If a namespace specifies multiple egress IPs, and the node hosting its current egress IP stops responding, then nodes will automatically switch to using the namespace's next available egress IP. (Each node does this independently, when it notices that packets are being sent but not responded to.)

I tried to do this without changing the existing egressip code, but the existing code was based on the idea of processing each egress IP independently, and that doesn't work very well when changes to one egress IP can affect whether another egress IP is used. So I had to do some reorganization.

Some things to fix later (either before this gets merged, or in followup PRs, or in future patch releases):

Make the poll times configurable
Be more proactive about noticing bad egress nodes, rather than only noticing them after someone has tried to use them. (IOW, ping all the nodes all the time.)
Don't do liveness checking of nodes if there are no HA-able egress IPs on them. (IOW, don't ping all the nodes all the time.) Eg, in existing clusters where every namespace has only a single EgressIP, there's no need to do liveness checking, since there are no backup nodes to redirect namespaces to if their primary egress nodes fail.

@openshift/sig-networking PTAL

imcsk8 · 2018-05-03T00:59:37Z

LGTM

knobunc

/lgtm

@openshift/networking PTAL

knobunc · 2018-05-22T12:59:55Z

pkg/network/node/vxlan_monitor.go

+}
+
+const (
+	defaultPollInterval = 5 * time.Second


Can we doc these constants please?

updated the egressVXLANMonitor docs to clarify how the constants get used

knobunc · 2018-05-22T13:47:05Z

pkg/network/node/vxlan_monitor.go

+// The fact that we (normally) use pod-to-egress traffic to do the monitoring rather than
+// actively pinging the nodes means that if an egress node falls over while no one is
+// using an egress IP on it, we won't notice the problem until someone does try to use it.
+// So, eg, the first pod created in a namespace might spend 5 seconds trying to talk to a


It's (max) 7 seconds... right? Or am I missing something.

Every pollInterval (5s) we check the traffic to each node. If there's a mismatch we go into retry mode which re-checks every 1s and call it failed after 2 retries. So we will call it bad in best case 2s, worst case 7s, and average 4.5s.

Yes. Changed it from "5" to "several"

dcbw · 2018-05-23T02:11:42Z

pkg/network/node/vxlan_monitor.go

+		nodeIP: nodeIP,
+		sdnIP:  sdnIP,
+	}
+	if len(evm.monitorNodes) == 1 && evm.pollInterval != 0 {


Is it worth turning off polling in RemoveNode()? Maybe not, just curious.

Hm... I was doing it from poll() (it returned true if evm.monitorNodes was empty, causing the PollInfinite() to exit). But that creates a race condition if you delete the last node, then add a new node before poll() runs. So I redid it with a stop channel.

dcbw

/lgtm

If a namespace has multiple egress IPs, monitor egress traffic and switch to an alternate egress IP if the currently-selected one appears dead.

Most dump-flows calls are part of health checks and don't normally need to be logged about unless they fail.

danwinship · 2018-06-27T15:24:12Z

/hold cancel

danwinship · 2018-06-27T15:24:19Z

/retest

danwinship · 2018-06-27T16:33:08Z

@dcbw @knobunc needs re-lgtm after having been rebased around #19603 and #19926

knobunc

/lgtm

openshift-ci-robot · 2018-06-27T17:41:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, dcbw, knobunc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/network/OWNERS~~ [danwinship,dcbw,knobunc]
~~pkg/util/ovs/OWNERS~~ [danwinship,dcbw,knobunc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danwinship · 2018-06-29T12:47:31Z

/retest

danwinship · 2018-06-29T13:56:21Z

/retest

danwinship · 2018-06-29T16:33:20Z

/retest

danwinship · 2018-06-29T20:48:19Z

/retest

danwinship · 2018-07-01T14:00:46Z

/retest

bmeng · 2018-08-06T09:17:22Z

pkg/network/node/egressip.go

+	for _, ip := range eg.namespaces[0].requestedIPs {
+		eg2 := eip.egressIPs[ip]
+		if eg2 != eg && len(eg2.nodes) == 1 && eg2.nodes[0] == eg.nodes[0] {
+			return false, fmt.Errorf("Multiple EgressIPs (%s, %s) for VNID %d on node %s", eg.ip, eg2.ip, eg.namespaces[0].vnid, eg.nodes[0].nodeIP)


@danwinship What is the reason to make the egressIP unavailable when all the egressIPs which are claimed by namespace are held by the same node? IMO, this condition will break the HA, but should be harmless if only let the first element work.

@bmeng If you set multiple egress IPs on a NetNamespace, then you are expecting to get high availability. If the IPs are both on the same node though, then you aren't actually getting any HA. So, as in other cases, we break things to make the admin notice the misconfiguration.

Ok, that makes sense. Thanks.

danwinship added component/networking sig/networking do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. labels May 1, 2018

openshift-ci-robot requested review from rajatchopra and smarterclayton May 1, 2018 15:00

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 1, 2018

danwinship force-pushed the auto-egress-ip-ha branch from 2b044d1 to 473dcfd Compare May 2, 2018 07:00

knobunc self-assigned this May 2, 2018

danwinship mentioned this pull request May 3, 2018

Auto egress IP code reorg in prep for future features #19603

Merged

danwinship force-pushed the auto-egress-ip-ha branch from 473dcfd to b223227 Compare May 8, 2018 14:20

knobunc requested review from dcbw and pravisankar May 15, 2018 14:44

knobunc approved these changes May 22, 2018

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 22, 2018

dcbw reviewed May 23, 2018

View reviewed changes

dcbw approved these changes May 23, 2018

View reviewed changes

openshift-ci-robot assigned dcbw May 23, 2018

danwinship force-pushed the auto-egress-ip-ha branch from b223227 to 2e101d1 Compare May 23, 2018 14:06

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 23, 2018

danwinship force-pushed the auto-egress-ip-ha branch from 2e101d1 to 940d775 Compare May 23, 2018 15:32

danwinship mentioned this pull request Jun 6, 2018

Fix a bug when a namespace has two egress IPs on same node #19926

Merged

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2018

danwinship added 3 commits June 7, 2018 14:00

Simplify egress IP change tracking

0900e92

Basic high-availability for auto egress IPs

6044355

If a namespace has multiple egress IPs, monitor egress traffic and switch to an alternate egress IP if the currently-selected one appears dead.

Log ovs dump-flows at 5, not 4

0439a83

Most dump-flows calls are part of health checks and don't normally need to be logged about unless they fail.

danwinship force-pushed the auto-egress-ip-ha branch from 9c76ad6 to 0439a83 Compare June 7, 2018 18:04

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2018

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 27, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 27, 2018

knobunc approved these changes Jun 27, 2018

View reviewed changes

danwinship mentioned this pull request Jun 29, 2018

Migrate egress IP tracking code to pkg/network/common #20155

Merged

openshift-merge-robot merged commit d0eff76 into openshift:master Jul 1, 2018

danwinship deleted the auto-egress-ip-ha branch July 20, 2018 13:07

bmeng reviewed Aug 6, 2018

View reviewed changes

danwinship mentioned this pull request Aug 13, 2018

Basic high availability for auto egress IPs #20620

Merged

Conversation

danwinship commented May 1, 2018

Uh oh!

imcsk8 commented May 3, 2018

Uh oh!

knobunc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcbw left a comment

Choose a reason for hiding this comment

Uh oh!

danwinship commented Jun 27, 2018

Uh oh!

danwinship commented Jun 27, 2018

Uh oh!

danwinship commented Jun 27, 2018

Uh oh!

knobunc left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Jun 27, 2018

Uh oh!

danwinship commented Jun 29, 2018

Uh oh!

danwinship commented Jun 29, 2018

Uh oh!

danwinship commented Jun 29, 2018

Uh oh!

danwinship commented Jun 29, 2018

Uh oh!

danwinship commented Jul 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants