OCPBUGS-64847: USHIFT-6489: UPSTREAM: 136508: kubelet: Fix race condition when pods are rejected by pacevedom · Pull Request #2574 · openshift/kubernetes

pacevedom · 2026-01-25T10:48:50Z

Fix race condition in kubelet where rejected pods could be incorrectly counted as active during resource accounting, causing spurious rejections of subsequent pods.

When a pod fails admission (e.g., insufficient resources), the kubelet:

Sets pod status to Failed with a specific rejection message.
Skips adding the pod to pod_workers (no containers to manage). Also sets a specific status, including a message prefix about rejection.
Schedules sync of the pod's status to the API server.
A re-sync/UPDATE operation comes from the API server.
The pod is added to pod_workers as a first sync with Pending phase (as it is the API server status).
The pod's local status is updated to Failed and terminatingAt is set. This signals the pod as actively terminating, even though it has no containers.
A new pod comes in, while going through admission active pods are filtered using filterOutInactivePods.
A pod is considered active if it has containers running or is actively terminating. This is the case for the rejected pod, as it has terminatingAt set.
Race condition: The new pod may be rejected because resource accounting is considering a pod that has no containers.

This only happens if a new pod arrives before the whole status sync round-trip to API server is finished.

If rejected pods do not have any resources they do not require any surveillance/cleanup, therefore they should not be added to pod_workers. From Kubelet's POV:
When receiving an ADD:

A pod being added to the kubelet undergoes the admission process. Rejected pods are not included in pod_workers.
A pod being added after a kubelet restart (after a restart pod_workers is empty and needs rebuilding):
- The pod was not yet rejected by previous ADD operation, therefore it does not have the status updated and goes through admission, which will reject it again.
- The pod was rejected by previous ADD operation but API server did not have the re-sync ready, therefore it has no status and goes through regular admission process, getting a rejection.
- The pod was rejected by previous ADD operation and API server was synced. The status on the pod signals it was rejected and the logic can skip it based on that.

When receiving an UPDATE:

Any pod that comes through an UPDATE operation was already seen in an ADD operation. This is enforced by the Kubelet's config layer.
If a pod is not present in the pod_workers already, it means it was previously rejected by the ADD operation, therefore it can be skipped.

What type of PR is this?

/kind bug
/kind flake

What this PR does / why we need it:

Which issue(s) this PR is related to:

Fixes kubernetes#135296

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed a bug that caused rejected pods to be considered in resource accounting when scheduling (#135296)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

openshift-ci-robot · 2026-01-25T10:48:56Z

@pacevedom: This pull request references Jira Issue OCPBUGS-64847, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

This pull request references USHIFT-6489 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Rejected pods could have their resources incorrectly counted during
admission of subsequent pods, causing spurious rejections.

The race occurred because filterOutInactivePods() used the condition:
isAdmittedPodTerminal(p) && !IsPodTerminationRequested(p.UID)

When a pod is rejected, rejectPod() set statusManager to Failed, but
skips notifying podWorkers, rendering inconsistent state. The
statusManager then syncs to the API server, which sends an UPDATE
back to the kubelet. HandlePodUpdates calls podWorkers.UpdatePod() and
sets terminatingAt. If a new pod arrives after this API round-trip is
complete, IsPodTerminationRequested returns true, defeating the filter
condition (true && !true = false).

Fix by removing the !IsPodTerminationRequested check. The statusManager
is updated synchronously in rejectPod() and is the reliable source of
truth for terminal pod status during admission. This check was filtering
pods if they are terminal and podWorker does not know yet. If the UPDATE
from the statusManager resync is received before a new pod is scheduled,
the offending pod is now considered terminal and has terminatingAt set,
failing the check and counting it as if it was active because it has not
finished yet (even though it has not containers running).

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-01-25T10:49:00Z

@pacevedom: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

c8ea347|kubelet: Fix race condition when pods are rejected: does not specify an upstream backport in the commit message

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

openshift-ci · 2026-01-25T10:49:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pacevedom
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/kubelet/DOWNSTREAM_OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2026-01-25T10:54:51Z

@pacevedom: This pull request references Jira Issue OCPBUGS-64847, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Rejected pods could have their resources incorrectly counted during
admission of subsequent pods, causing spurious rejections.

The race occurred because filterOutInactivePods() used the condition:
isAdmittedPodTerminal(p) && !IsPodTerminationRequested(p.UID)

When a pod is rejected, rejectPod() set statusManager to Failed, but
skips notifying podWorkers, rendering inconsistent state. The
statusManager then syncs to the API server, which sends an UPDATE
back to the kubelet. HandlePodUpdates calls podWorkers.UpdatePod() and
sets terminatingAt. If a new pod arrives after this API round-trip is
complete, IsPodTerminationRequested returns true, defeating the filter
condition (true && !true = false).

Fix by removing the !IsPodTerminationRequested check. The statusManager
is updated synchronously in rejectPod() and is the reliable source of
truth for terminal pod status during admission. This check was filtering
pods if they are terminal and podWorker does not know yet. If the UPDATE
from the statusManager resync is received before a new pod is scheduled,
the offending pod is now considered terminal and has terminatingAt set,
failing the check and counting it as if it was active because it has not
finished yet (even though it has not containers running).

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-01-25T23:18:12Z

@pacevedom: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

9bd1614|Add unit tests for new FilterOutInactivePods logic: does not specify an upstream backport in the commit message
c8ea347|kubelet: Fix race condition when pods are rejected: does not specify an upstream backport in the commit message
eec2945|Fix filterOutInactivePods for evicted pods: does not specify an upstream backport in the commit message

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

openshift-ci-robot · 2026-01-25T23:59:46Z

@pacevedom: This pull request references Jira Issue OCPBUGS-64847, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

Rejected pods could have their resources incorrectly counted during admission of subsequent pods, causing spurious rejections.

The Race Condition
The original check in filterOutInactivePods():
isAdmittedPodTerminal(p) && !IsPodTerminationRequested(p.UID)

When a pod is rejected, rejectPod() sets statusManager to Failed, but skips notifying podWorkers, rendering inconsistent state (pod has Failed but not yet cleaned up. In reality it didnt have any containers). The statusManager then syncs to the API server, which sends an UPDATE back to the kubelet.
HandlePodUpdates calls podWorkers.UpdatePod() and sets terminatingAt to signal cleanup. If a new pod arrives after this API round-trip is complete, IsPodTerminationRequested returns true, defeating the filter condition (true && !true = false).

Why IsPodTerminationRequested Is Wrong for Rejected Pods
IsPodTerminationRequested indicates the pod worker is actively terminating a pod - stopping containers, cleaning up resources, via terminatingAt. This makes sense for pods that actually started and need cleanup, which is not the case with rejected pods, which never ran any containers.

Why Simply Removing !IsPodTerminationRequested Doesn't Work
Evicting pods have Phase=Failed set before containers finish stopping. This behavior is unique to evicting pods, all the other phase transitions happen after the fact, not before.
Filtering them immediately would exclude pods still consuming CPU/memory/resources, breaking resource accounting and therefore admission.

The fix
Skip IsPodTerminationRequested for rejected pods, as we know it races, and include a check for the runtime pod cache to see if there are any containers/sandboxes:
isAdmittedPodTerminal(p) && !hasRuntimeContainers(p)

The new hasRuntimeContainers() function queries podCache.Get() for actual runtime state, not the API status (which has placeholder ContainerStatus entries created by generateAPIPodStatus even for rejected pods via UPDATE operations from API server syncs).

This correctly handles both cases:

Rejected pods: No runtime containers (never created) -> filtered immediately

Evicting pods: Have runtime containers until termination completes -> kept for resource accounting

What type of PR is this?

/kind bug
/kind flake

What this PR does / why we need it:

Which issue(s) this PR is related to:

Fixes kubernetes#135296

Special notes for your reviewer:

Does this PR introduce a user-facing change?
Fixed a bug that caused rejected pods to be considered in resource accounting when scheduling (#135296)
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

pacevedom · 2026-01-26T14:50:16Z

/test e2e-aws-ovn-runc
/test e2e-metal-ipi-ovn-ipv6

openshift-ci-robot · 2026-01-30T12:37:34Z

@pacevedom: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

[1b7b7ff|USHIFT-6489: Fix kubelet race condition in pod admission](1b7b7ff): does not specify an upstream backport in the commit message
[2a126dc|USHIFT-6489: Add unit tests for rejected pods new behavior](2a126dc): does not specify an upstream backport in the commit message

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

openshift-ci-robot · 2026-01-30T14:07:27Z

@pacevedom: This pull request references Jira Issue OCPBUGS-64847, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

Fix race condition in kubelet where rejected pods could be incorrectly counted as active during resource accounting, causing spurious rejections of subsequent pods.

When a pod fails admission (e.g., insufficient resources), the kubelet:

Sets pod status to Failed with a specific rejection message.

Skips adding the pod to pod_workers (no containers to manage). Also sets a specific status, including a message prefix about rejection.

Schedules sync of the pod's status to the API server.

A re-sync/UPDATE operation comes from the API server.

The pod is added to pod_workers as a first sync with Pending phase (as it is the API server status).

The pod's local status is updated to Failed and terminatingAt is set. This signals the pod as actively terminating, even though it has no containers.

A new pod comes in, while going through admission active pods are filtered using filterOutInactivePods.

A pod is considered active if it has containers running or is actively terminating. This is the case for the rejected pod, as it has terminatingAt set.

Race condition: The new pod may be rejected because resource accounting is considering a pod that has no containers.

This only happens if a new pod arrives before the whole status sync round-trip to API server is finished.

If rejected pods do not have any resources they do not require any surveillance/cleanup, therefore they should not be added to pod_workers. From Kubelet's POV:
When receiving an ADD:

A pod being added to the kubelet undergoes the admission process. Rejected pods are not included in pod_workers.

A pod being added after a kubelet restart (after a restart pod_workers is empty and needs rebuilding):

The pod was not yet rejected by previous ADD operation, therefore it does not have the status updated and goes through admission, which will reject it again.

The pod was rejected by previous ADD operation but API server did not have the re-sync ready, therefore it has no status and goes through regular admission process, getting a rejection.

The pod was rejected by previous ADD operation and API server was synced. The status on the pod signals it was rejected and the logic can skip it based on that.

When receiving an UPDATE:

Any pod that comes through an UPDATE operation was already seen in an ADD operation. This is enforced by the Kubelet's config layer.

If a pod is not present in the pod_workers already, it means it was previously rejected by the ADD operation, therefore it can be skipped.

What type of PR is this?

/kind bug
/kind flake

What this PR does / why we need it:

Which issue(s) this PR is related to:

Fixes kubernetes#135296

Special notes for your reviewer:

Does this PR introduce a user-facing change?
Fixed a bug that caused rejected pods to be considered in resource accounting when scheduling (#135296)
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-02-02T09:40:50Z

@pacevedom: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

[1b7b7ff|USHIFT-6489: Fix kubelet race condition in pod admission](1b7b7ff): does not specify an upstream backport in the commit message
[2a126dc|USHIFT-6489: Add unit tests for rejected pods new behavior](2a126dc): does not specify an upstream backport in the commit message
c58eaae|kubelet: Simplify rejected pod update logic: does not specify an upstream backport in the commit message

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

openshift-ci-robot · 2026-02-03T16:27:24Z

@pacevedom: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

[1b7b7ff|USHIFT-6489: Fix kubelet race condition in pod admission](1b7b7ff): does not specify an upstream backport in the commit message
[2a126dc|USHIFT-6489: Add unit tests for rejected pods new behavior](2a126dc): does not specify an upstream backport in the commit message
a67f323|kubelet: optimize UPDATE checks: does not specify an upstream backport in the commit message
c58eaae|kubelet: Simplify rejected pod update logic: does not specify an upstream backport in the commit message

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

kubelet: Prevent rejected pods from entering pod_workers When a pod fails admission, it iss marked as Failed but may still receive UPDATE operations from API server resyncs. If these updates cause the pod to enter pod_workers, it gets terminatingAt set and is counted as "active" by filterOutInactivePods until termination completes. This causes incorrect resource accounting during admission of subsequent pods. In order to avoid adding rejected pods into resource accounting, these should never enter pod_workers: They are in a terminal state and they will not run any containers that require cleanup. There is only one way of identifying a rejected pod, which is a special status that the kubelet sets for them. This is the only distinctive feature the kubelet may use to filter them out of pod_workers if they arrive through an UPDATE operation. As kubelet config loops always processes pods a ADD before UPDATE, when the UPDATE comes the pod must already be in pod_workers if it was admitted, so it is safe to assume that if the pod is not already in pod_workers it was rejected. In the event of a kubelet restart, all pods arrive as ADD again, so there is a need to check for the status message if the pod is in a terminal state (meaning it was rejected before the restart). Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>

Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>

openshift-ci-robot · 2026-02-05T07:32:37Z

@pacevedom: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

54fe4fa|kubelet: Optimize rejected pod detection: does not specify an upstream backport in the commit message
6020be9|kubelet: optimize UPDATE checks: does not specify an upstream backport in the commit message
6f43304|kubelet: Fix race condition in pod admission: does not specify an upstream backport in the commit message
8c23062|kubelet: Add unit tests for rejected pods new behavior: does not specify an upstream backport in the commit message
aae13be|kubelet: Simplify rejected pod update logic: does not specify an upstream backport in the commit message

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

openshift-ci · 2026-02-05T11:52:27Z

@pacevedom: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6	`54fe4fa`	link	true	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/e2e-aws-crun-wasm	`54fe4fa`	link	true	`/test e2e-aws-crun-wasm`
ci/prow/e2e-aws-ovn-techpreview	`54fe4fa`	link	false	`/test e2e-aws-ovn-techpreview`
ci/prow/integration	`54fe4fa`	link	true	`/test integration`
ci/prow/verify-commits	`54fe4fa`	link	true	`/test verify-commits`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from rphillips and sjenning January 25, 2026 10:49

pacevedom changed the title ~~OCPBUGS-64847 USHIFT-6489: kubelet: Fix race condition when pods are rejected~~ OCPBUGS-64847: USHIFT-6489: UPSTREAM: 136508: kubelet: Fix race condition when pods are rejected Jan 25, 2026

pacevedom force-pushed the USHIFT-6489 branch from 9bd1614 to 2a126dc Compare January 30, 2026 12:37

rissh mentioned this pull request Feb 4, 2026

OCPBUGS-67902,OCPBUGS-67977: CVE-2025-65637 - bump github.com/sirupsen/logrus to v1.9.3 [4.15] #2583

Open

pacevedom added 5 commits February 5, 2026 08:14

kubelet: Add unit tests for rejected pods new behavior

8c23062

Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>

kubelet: Simplify rejected pod update logic

aae13be

Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>

kubelet: optimize UPDATE checks

6020be9

Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>

kubelet: Optimize rejected pod detection

54fe4fa

Signed-off-by: Pablo Acevedo Montserrat <pacevedo@redhat.com>

pacevedom force-pushed the USHIFT-6489 branch from a67f323 to 54fe4fa Compare February 5, 2026 07:32

Conversation

pacevedom commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

openshift-ci-robot commented Jan 25, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

openshift-ci-robot commented Jan 25, 2026

Uh oh!

openshift-ci bot commented Jan 25, 2026

Uh oh!

openshift-ci-robot commented Jan 25, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

openshift-ci-robot commented Jan 25, 2026

Uh oh!

openshift-ci-robot commented Jan 25, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

pacevedom commented Jan 26, 2026

Uh oh!

openshift-ci-robot commented Jan 30, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Jan 30, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

openshift-ci-robot commented Feb 2, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 3, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 5, 2026

Uh oh!

openshift-ci bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pacevedom commented Jan 25, 2026 •

edited

Loading

openshift-ci-robot commented Jan 25, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 30, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 2, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 3, 2026 •

edited by openshift-ci bot

Loading