OCPBUGS-64847: USHIFT-6489: UPSTREAM: 136508: kubelet: Fix race condition when pods are rejected#2574
OCPBUGS-64847: USHIFT-6489: UPSTREAM: 136508: kubelet: Fix race condition when pods are rejected#2574pacevedom wants to merge 5 commits intoopenshift:masterfrom
Conversation
|
@pacevedom: This pull request references Jira Issue OCPBUGS-64847, which is valid. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. This pull request references USHIFT-6489 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@pacevedom: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pacevedom The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@pacevedom: This pull request references Jira Issue OCPBUGS-64847, which is valid. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@pacevedom: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
|
@pacevedom: This pull request references Jira Issue OCPBUGS-64847, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-ovn-runc |
9bd1614 to
2a126dc
Compare
|
@pacevedom: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
|
@pacevedom: This pull request references Jira Issue OCPBUGS-64847, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@pacevedom: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
|
@pacevedom: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
kubelet: Prevent rejected pods from entering pod_workers When a pod fails admission, it iss marked as Failed but may still receive UPDATE operations from API server resyncs. If these updates cause the pod to enter pod_workers, it gets terminatingAt set and is counted as "active" by filterOutInactivePods until termination completes. This causes incorrect resource accounting during admission of subsequent pods. In order to avoid adding rejected pods into resource accounting, these should never enter pod_workers: They are in a terminal state and they will not run any containers that require cleanup. There is only one way of identifying a rejected pod, which is a special status that the kubelet sets for them. This is the only distinctive feature the kubelet may use to filter them out of pod_workers if they arrive through an UPDATE operation. As kubelet config loops always processes pods a ADD before UPDATE, when the UPDATE comes the pod must already be in pod_workers if it was admitted, so it is safe to assume that if the pod is not already in pod_workers it was rejected. In the event of a kubelet restart, all pods arrive as ADD again, so there is a need to check for the status message if the pod is in a terminal state (meaning it was rejected before the restart). Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>
Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>
Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>
Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>
Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>
a67f323 to
54fe4fa
Compare
|
@pacevedom: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
|
@pacevedom: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Fix race condition in kubelet where rejected pods could be incorrectly counted as active during resource accounting, causing spurious rejections of subsequent pods.
When a pod fails admission (e.g., insufficient resources), the kubelet:
terminatingAtis set. This signals the pod as actively terminating, even though it has no containers.filterOutInactivePods.terminatingAtset.This only happens if a new pod arrives before the whole status sync round-trip to API server is finished.
If rejected pods do not have any resources they do not require any surveillance/cleanup, therefore they should not be added to pod_workers. From Kubelet's POV:
When receiving an ADD:
When receiving an UPDATE:
What type of PR is this?
/kind bug
/kind flake
What this PR does / why we need it:
Which issue(s) this PR is related to:
Fixes kubernetes#135296
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: