Allow bootstrap configuration to be configured and reentrant#16571
Allow bootstrap configuration to be configured and reentrant#16571openshift-merge-robot merged 11 commits intoopenshift:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
|
@deads2k bootstrapping |
b76d21a to
6704770
Compare
| // BootstrapConfigName is the name of a config map to read node-config.yaml from. | ||
| BootstrapConfigName string | ||
| // BootstrapConfigNamespace is the namespace the config map for bootstrap config is expected to load from. | ||
| BootstrapConfigNamespace string |
There was a problem hiding this comment.
Why another configurable namespace? Sounds like another thing we will end up removing in the future.
There was a problem hiding this comment.
This is configurable from the node, not from the master. Nodes are allowed to point to different namespaces.
There was a problem hiding this comment.
This is configurable from the node, not from the master. Nodes are allowed to point to different namespaces.
I think the "why" question still stands? Why not be prescriptive, at least until someone has a compelling subdivision use-case.
There was a problem hiding this comment.
Because we don't as a general rule lock clients into namespaces. I don't see a trend away from clients owning their destiny, and I don't believe in being overly restrictive here.
There was a problem hiding this comment.
Because we don't as a general rule lock clients into namespaces. I don't see a trend away from clients owning their destiny, and I don't believe in being overly restrictive here.
There was a problem hiding this comment.
Because we don't as a general rule lock clients into namespaces. I don't see a trend away from clients owning their destiny, and I don't believe in being overly restrictive here.
summarizing in-person discussion: not configurable in the server, configurable in the client if someone really wants to wire it.
| NodeProxierRoleBindingName = NodeProxierRoleName + "s" | ||
| NodeAdminRoleBindingName = NodeAdminRoleName + "s" | ||
| NodeReaderRoleBindingName = NodeReaderRoleName + "s" | ||
| NodeConfigReaderRoleBindingName = NodeConfigReaderRoleName + "s" |
There was a problem hiding this comment.
Why is it still there for the others?
There was a problem hiding this comment.
Why is it still there for the others?
Legacy. We stopped doing it upstream.
There was a problem hiding this comment.
Where's the comment in the code saying that?
There was a problem hiding this comment.
We should probably separate into section with header and footer to that effect
|
|
||
| func GetBootstrapNodeConfigProvisioningRoleBindings(namespace string) []rbac.RoleBinding { | ||
| return []rbac.RoleBinding{ | ||
| newOriginRoleBindingForClusterRole(NodeConfigReaderRoleBindingName, NodeConfigReaderRoleName, namespace). |
There was a problem hiding this comment.
Nothing new should use newOriginRoleBindingForClusterRole
| "view", | ||
| ) | ||
|
|
||
| func GetBootstrapNodeConfigProvisioningRoleBindings(namespace string) []rbac.RoleBinding { |
There was a problem hiding this comment.
No namespace configuration please.
6704770 to
9a6fa55
Compare
| return ret | ||
| } | ||
|
|
||
| func GetBootstrapNodeConfigProvisioningRoleBindings(namespace string) []rbac.RoleBinding { |
There was a problem hiding this comment.
This namespace should be pinnned. @enj did you pull merge yet?
| // TODO: expose other things like /healthz on the node once we figure out non-resource URL policy across systems | ||
| }, | ||
| }, | ||
| { |
There was a problem hiding this comment.
This looks like it should be namespace scoped. #16517 made it easily possible
There was a problem hiding this comment.
Lets make this a role since we remove the namespace configuration.
pkg/cmd/server/origin/master.go
Outdated
| } | ||
|
|
||
| // add post-start hooks | ||
| aggregatedAPIServer.GenericAPIServer.AddPostStartHookOrDie("node.openshift.io-sharednamespace", c.ensureOpenShiftNodeNamespace) |
There was a problem hiding this comment.
Because we auto init namespaces in the role bindings?
There was a problem hiding this comment.
Switched to the shared mode.
Why is shared resource namespaces still its own hook?
There was a problem hiding this comment.
Why is shared resource namespaces still its own hook?
cruft
| // try to refresh the latest node-config.yaml | ||
| o.ConfigFile = filepath.Join(nodeConfigDir, "node-config.yaml") | ||
| config, err := c.Core().ConfigMaps("kube-system").Get("node-config", metav1.GetOptions{}) | ||
| config, err := c.Core().ConfigMaps(o.NodeArgs.BootstrapConfigNamespace).Get(o.NodeArgs.BootstrapConfigName, metav1.GetOptions{}) |
There was a problem hiding this comment.
doesn't seem like the namespace should be configurable. Name yes, but I don't see why we'd let people change the namespace.
| // try to refresh the latest node-config.yaml | ||
| o.ConfigFile = filepath.Join(nodeConfigDir, "node-config.yaml") | ||
| config, err := c.Core().ConfigMaps("kube-system").Get("node-config", metav1.GetOptions{}) | ||
| config, err := c.Core().ConfigMaps(o.NodeArgs.BootstrapConfigNamespace).Get(o.NodeArgs.BootstrapConfigName, metav1.GetOptions{}) |
There was a problem hiding this comment.
you're re-using this error down below. Please give it a special name.
There was a problem hiding this comment.
you're re-using this error down below. Please give it a special name.
error name still outstanding
|
|
||
| return err | ||
| // if there is no node-config.yaml and no server config map, generate one | ||
| glog.V(2).Infof("Generating a local configuration since no server config available") |
There was a problem hiding this comment.
"since no server or local config available"
| } | ||
| if _, ok := cmdLineArgs["experimental-cluster-signing-duration"]; !ok { | ||
| cmdLineArgs["experimental-cluster-signing-duration"] = []string{"0s"} | ||
| cmdLineArgs["experimental-cluster-signing-duration"] = []string{"720h"} |
There was a problem hiding this comment.
I'd prefer much shorter to flush out refreshing errors. o(12 hours)
There was a problem hiding this comment.
It doesn't work in 3.7 for client certs so we're going to do it on the node level or backport the required changes. Also, the goal is to get to bootstrapping first, then tighten rotation, so I don't want to block on that.
There was a problem hiding this comment.
It doesn't work in 3.7 for client certs so we're going to do it on the node level or backport the required changes. Also, the goal is to get to bootstrapping first, then tighten rotation, so I don't want to block on that.
This looks like a time bomb. Are you saying you're going to be manually refereshing certificates or something?
There was a problem hiding this comment.
Uh so what happens after a month?
| return err | ||
| } | ||
| return nil | ||
| switch { |
There was a problem hiding this comment.
if you are suppose to bootstrap and you can't, I think you should fail, not use a local config instead.
There was a problem hiding this comment.
Fair. Although should probably be based on whether you asked for a config or not.
There was a problem hiding this comment.
I'm torn on this. An empty config map should result in a default. A missing config map or typod map seems like an error. But at the same time a 403 or other error (especially for the default config) seems borderline arbitrary. A node having a config you didn't expect is bad, a node being down because of a correlated auth mixup also seems bad.
|
This PR does a lot more than "correcting small issues", for example it permits to configure stuff that some people believe should be hard coded (eg namespaces). Can you please give a better explanation of what is the aim of this PR @smarterclayton ? |
|
@smarterclayton Could you please explain why you are proposing @enj is also not keen on configurable config namespace. |
|
Nodes configure themselves against masters. Nodes need to be able to
bulk configured to a class of config (name). We don't hardcode
namespaces for config lookup in general principle. Admins are free to
point configuration elsewhere. There is nothing special about the
config source location - especially for nodes that may be owned by
separate security partitions.
Bootstrapping is the replacement for static config and will be the
default mode for online 3.7+ and future openshift deployments. The
option to statically deliver config is preserved for bare metal
environments.
|
9a6fa55 to
8b7da17
Compare
|
Seeing multiple rebinds using the new way for namespace mappings: #16611 |
|
Note that I in general agree with the "don't make things configurable that
you don't want to be configurable". On the server side, the namespaces we
create in bootstrap and our core setup definitely fall under that.
But compiling in our binaries that act as clients of the system to say "I
will not accept results that aren't from a precanned namespace" crosses the
line a bit into being too forceful. If someone wants to bind their
node-config with their kubelet config and set it in kube-system because
that's the only place the new kubelet will look, I'm ok with that being an
option for a client.
…On Thu, Sep 28, 2017 at 8:39 PM, OpenShift CI Robot < ***@***.***> wrote:
@smarterclayton <https://github.com/smarterclayton>: The following tests
*failed*, say /retest to rerun them all:
Test name Commit Details Rerun command
ci/openshift-jenkins/extended_conformance_gce 9a6fa55
<9a6fa55>
link
<https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16571/test_pull_request_origin_extended_conformance_gce/8768/> /test
extended_conformance_gce
ci/openshift-jenkins/integration 8b7da17
<8b7da17>
link
<https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16571/test_pull_request_origin_integration/8603/> /test
integration
Full PR test history <https://openshift-gce-devel.appspot.com/pr/16571>. Your
PR dashboard <https://openshift-gce-devel.appspot.com/pr/smarterclayton>.
Please help us cut down on flakes by linking to
<https://github.com/kubernetes/community/blob/master/contributors/devel/flaky-tests.md#filing-issues-for-flaky-tests>
an open issue
<https://github.com/openshift/origin/issues?q=is:issue+is:open> when you
hit one in your PR.
Instructions for interacting with me using PR comments are available here
<https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md>.
If you have questions or suggestions related to my behavior, please file an
issue against the kubernetes/test-infra
<https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
repository. I understand the commands that are listed here
<https://github.com/kubernetes/test-infra/blob/master/commands.md>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16571 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p5PiRustT6_5O5CCsFnvsKxHuMaDks5snDwkgaJpZM4Pk3C7>
.
|
8b7da17 to
51410f3
Compare
d5b2d8e to
d6a7486
Compare
| - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true | ||
| masterClientConnectionOverrides: | ||
| acceptContentTypes: application/vnd.kubernetes.protobuf,application/json | ||
| burst: 40 |
There was a problem hiding this comment.
qps looks ok, but burst looks small
There was a problem hiding this comment.
Actually our default.
| hostPID: true | ||
| containers: | ||
| - name: network | ||
| image: openshift/node:v3.7.0-alpha.1 |
There was a problem hiding this comment.
not a parameterized version/tag?
There was a problem hiding this comment.
Just an example, wanted to have the larger discussion about how static config flows from openshift/origin -> ansible first. Wanted to have something checked in that should be reproducible for a bit, then will be made formal and moved out of here.
| // Start up a metrics server if requested | ||
| if len(c.ProxyConfig.MetricsBindAddress) > 0 { | ||
| mux := http.NewServeMux() | ||
| mux.HandleFunc("/proxyMode", func(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
name it alpha or something? this is like config-lite.
There was a problem hiding this comment.
This is consistent with upstream, so we can't rename it.
| // VolumeDir is the volume storage directory. | ||
| VolumeDir string | ||
| // VolumeDirProvided is set to true if the user has specified this flag. | ||
| VolumeDirProvided bool |
There was a problem hiding this comment.
Hold your nose a month or so longer and most of this code dies and goes away and we're all happy.
|
I still think you should remove lgtm otherwise. |
d6a7486 to
d349f9b
Compare
|
/test all Updated to remove the |
The cert may have expired.
When we mount /var/run/openshift-sdn into the container, we need to be able to clear its contents but the directory itself cannot be removed as it is a mount point. Also clarify one error.
This makes running in a separate process for networks able to have a health check and for metrics to be reported.
CFSSL throws an opaque error, and bootstrapping requires user intervention to configure anyway.
This allows the kubelet to be configured to load default configuration out of a known namespace. By default it is openshift-node/node-config. Correct an error in bootstrapping where errors weren't logged, and properly ignore forbidden errors when trying to load the config map. Add a better description of bootstrapping to openshift start node. Ensure the volume directory is correctly loaded from node-config during bootstrapping instead of being overwritten into the config directory. Enable client and server rotation on the node automatically when bootstrapping, and only do a client certificate creation (server bootstrapping done by kubelet only). This unfortunately requires setting a fake value in the node config that will be cleared later - as we are moving towards a future where node-config does not exist this entire section will likely go away. Relax validation on node-config to allow cert-dir to be provided instead of explicit certificates. bootstrap
Other config variants will be stored in this location. The new namespace ensures clean security isolation.
Need to be able to take node-config from bootstrap node. For openshift start network the --kubeconfig flag from the CLI overrides the value of masterKubeConfig in the provided node config. If the value is empty (like it is by default) the in-cluster-config is used. Reorganize the node startup slightly so there is even less overlap between kubelet and network. A future change will completely separate these two initialization paths.
d349f9b to
ae05ccd
Compare
|
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest |
|
Applying label |
|
Automatic merge from submit-queue. |
Make bootstrapping a real production node possible.
--bootstrap-config-namecan be used to customize which config is looked up (one per node group)openshift start networkwork podifiedThere is still one bug outstanding upstream that can be fixed separately - the kubelet client rotation can fail due to the cert expiring and be unable to get new certs, so it never exits.
Tested the following scenario extensively (requires a new openshift/node image tagged as v3.7.0-alpha.1):
oc create configmap -n openshift-node node-config --from-file=node-config.yaml=contrib/kubernetes/default-node-config.yamlopenshift start node --bootstrap-config-name=node-config --config=/etc/origin/node/node-config.yaml --enable=kubelet --loglevel=3(which has it run only the kubelet)oc observe csr -- oc adm certificate approveto approve both csroc create -f contrib/kubernetes/static/network-policy.yamloc create -f contrib/kubernetes/static/network-daemonset.yamloc run --restart=Never --attach -it --image=centos:7 -- /bin/bashand thenyum install bind-utils -y && dig +search kubernetes.default.svcFollow up for the daemonset - openshift-sdn expects to have access to the dockershim.sock which this doesn't bind mount in.