Kubelet Bug: Sandbox Not Cleaned Up - Unraveling Retention Issues

In the process of reviewing the kubelet’s removal of pods, I encountered an issue where the sandbox of a “removed” pod was not being cleaned up.

In the previous article “Analysis of ‘an error occurred when try to find container’ Errors in Kubernetes Pod Removal Processes,” we analyzed the pod removal process. The garbage collector in kubelet is responsible for cleaning up the sandbox of exited pods, but in practice, it was not being removed.

Why wasn’t this sandbox removed in the kubelet’s pod removal process? In other words, why didn’t the garbage collector clean up the exited sandbox?

This article analyzes kubelet logs and combines them with code logic to outline the entire process of pod removal, identifying the root cause of this phenomenon.

Pod Removal Process Series Articles:

The Kubernetes version in this article is 1.23.10, and the runtime is containerd and log level is 4.

There is a NotReady sandbox that has been in existence for two months, and there are no associated containers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# crictl pods | grep nginx-deployment-bd4476b48-fpgvc
cf8d3a590085c 2 months ago NotReady nginx-deployment-bd4476b48-fpgvc default 0 (default)
# crictl ps | grep nginx-deployment-bd4476b48-fpgvc

# crictl inspectp cf8d3a590085c
{
  "status": {
    "id": "cf8d3a590085cdadf382b354e6475e17501a56f7fe0996218831a2dd03109ab1",
    "metadata": {
      "attempt": 0,
      "name": "nginx-deployment-bd4476b48-fpgvc",
      "namespace": "default",
      "uid": "95d6b80b-77f5-4218-824e-69eec4998c22"
    },
    "state": "SANDBOX_NOTREADY",
    ....
    "info": {
      "pid": 0,
      "processStatus": "deleted",
      "netNamespaceClosed": true,

NotReady indicates that this sandbox has already exited.

The kubelet has been running for three months, meaning it hasn’t been restarted within the pod’s lifecycle.

1
2
3
4
# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
  Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Active: active (running) since Mon 2023-08-21 15:34:07 CST; 3 months 7 days ago

Upon inspecting the kubelet logs with a log level of 4, we found a recurring log entry:

1
I1124 05:49:00.241725 190330 pod_workers.go:1251] "Pod worker has been requested for removal but is still not fully terminated" podUID=95d6b80b-77f5-4218-824e-69eec4998c22

Fortunately, the kubelet logs at the time of pod removal are still available. Let’s begin the investigation from the start of the pod removal process.

The complete logs can be found at kubelet log.

When the podConfig detects that the pod has been deleted, the podWorker takes action.

1
2
3
4
5
I0919 11:11:20.322601  190330 kubelet.go:2130] "SyncLoop DELETE" source="api" pods=[default/nginx-deployment-bd4476b48-fpgvc]
I0919 11:11:20.322631  190330 pod_workers.go:625] "Pod is marked for graceful deletion, begin teardown" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.322658  190330 pod_workers.go:888] "Processing pod event" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1
I0919 11:11:20.322689  190330 pod_workers.go:1005] "Pod worker has observed request to terminate" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.322700  190330 kubelet.go:1795] "syncTerminatingPod enter" pod="default/nginx-deployment-bd4476b48-fpgvc"

Container stops successfully.

1
2
3
4
5
6
7
I0919 11:11:20.322893  190330 kuberuntime_container.go:719] "Killing container with a grace period override" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6" gracePeriod=30
I0919 11:11:20.322907  190330 kuberuntime_container.go:723] "Killing container with a grace period" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6" gracePeriod=30
I0919 11:11:20.323279  190330 event.go:294] "Event occurred" object="default/nginx-deployment-bd4476b48-fpgvc" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container nginx"
I0919 11:11:20.334936  190330 status_manager.go:685] "Patch status for pod" pod="default/nginx-deployment-bd4476b48-fpgvc" patch="{\"metadata\":{\"uid\":\"95d6b80b-77f5-4218-824e-69eec4998c22\"}}"
I0919 11:11:20.334951  190330 status_manager.go:692] "Status for pod is up-to-date" pod="default/nginx-deployment-bd4476b48-fpgvc" statusVersion=3
I0919 11:11:20.334961  190330 kubelet_pods.go:932] "Pod is terminated, but some containers are still running" pod="default/nginx-deployment-bd4476b48-fpgvc"
I0919 11:11:20.426026  190330 kuberuntime_container.go:732] "Container exited normally" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6"

PLEG detects the exit of the container.

1
I0919 11:11:20.515262  190330 kubelet.go:2152] "SyncLoop (PLEG): event for pod" pod="default/nginx-deployment-bd4476b48-fpgvc" event=&{ID:95d6b80b-77f5-4218-824e-69eec4998c22 Type:ContainerDied Data:24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6}

The podWorker executes syncTerminatedPod, and the podWorker is in the terminated state.

1
2
3
4
5
I0919 11:11:20.706809  190330 kubelet.go:1873] "Pod termination stopped all running containers" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.706823  190330 kubelet.go:1875] "syncTerminatingPod exit" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.706837  190330 pod_workers.go:1050] "Pod terminated all containers successfully" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.706857  190330 pod_workers.go:988] "Processing pod event done" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1
I0919 11:11:20.706866  190330 pod_workers.go:888] "Processing pod event" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=2

SyncLoop sync is triggered.

1
I0919 11:11:21.232081  190330 kubelet.go:2171] "SyncLoop (SYNC) pods" total=1 pods=[default/nginx-deployment-bd4476b48-fpgvc]

PLEG detects the exit of the sandbox.

1
I0919 11:11:21.519225  190330 kubelet.go:2152] "SyncLoop (PLEG): event for pod" pod="default/nginx-deployment-bd4476b48-fpgvc" event=&{ID:95d6b80b-77f5-4218-824e-69eec4998c22 Type:ContainerDied Data:cf8d3a590085cdadf382b354e6475e17501a56f7fe0996218831a2dd03109ab1}

The podWorker begins to execute syncTerminatedPod.

1
I0919 11:11:21.519247  190330 kubelet.go:1883] "syncTerminatedPod enter" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22

The container is removed.

1
2
I0919 11:11:21.519263  190330 kuberuntime_container.go:947] "Removing container" containerID="24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6"
I0919 11:11:21.519272  190330 scope.go:110] "RemoveContainer" containerID="24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6"

The podWorker completes the execution of syncTerminatedPod, and the podWorker’s status becomes finished.

1
2
3
I0919 11:11:21.525367  190330 kubelet.go:1924] "syncTerminatedPod exit" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:21.525378  190330 pod_workers.go:1105] "Pod is complete and the worker can now stop" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:21.525395  190330 pod_workers.go:959] "Processing pod event done" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=2

Perception of the status update of the pod.

1
I0919 11:11:21.534874  190330 kubelet.go:2127] "SyncLoop RECONCILE" source="api" pods=[default/nginx-deployment-bd4476b48-fpgvc]

The pod is deleted again. Since the podWorker is in the finished state, no action is taken.

1
2
I0919 11:11:21.546521  190330 kubelet.go:2130] "SyncLoop DELETE" source="api" pods=[default/nginx-deployment-bd4476b48-fpgvc]
I0919 11:11:21.546554  190330 pod_workers.go:611] "Pod is finished processing, no further updates" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22

The pod is removed from the API server. Since the podWorker is in the finished state, no action is taken.

1
2
3
I0919 11:11:21.551243  190330 kubelet.go:2124] "SyncLoop REMOVE" source="api" pods=[default/nginx-deployment-bd4476b48-fpgvc]
I0919 11:11:21.551265  190330 kubelet.go:1969] "Pod has been deleted and must be killed" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:21.551286  190330 pod_workers.go:611] "Pod is finished processing, no further updates" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22

The housekeeping process is triggered, and the pod worker is removed, followed by the creation of a new pod worker.

1
2
3
4
5
6
7
8
I0919 11:11:22.230603  190330 kubelet.go:2202] "SyncLoop (housekeeping)"
I0919 11:11:22.237366  190330 kubelet_pods.go:1082] "Clean up pod workers for terminated pods"
I0919 11:11:22.237397  190330 pod_workers.go:1258] "Pod has been terminated and is no longer known to the kubelet, remove all history" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237414  190330 kubelet_pods.go:1111] "Clean up probes for terminated pods"
I0919 11:11:22.237425  190330 kubelet_pods.go:1134] "Clean up orphaned pod containers" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237438  190330 pod_workers.go:571] "Pod is being synced for the first time" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237448  190330 pod_workers.go:620] "Pod is orphaned and must be torn down" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237466  190330 kubelet_pods.go:1148] "Clean up orphaned pod statuses"

An error occurs in the (*Kubelet).HandlePodCleanups function during the housekeeping process, and the execution of runningRuntimePods, err = kl.containerRuntime.GetPods(false) results in an error.

1
2
3
4
5
E0919 11:11:22.237513  190330 remote_runtime.go:365] "ListPodSandbox with filter from runtime service failed" err="rpc error: code = Unavailable desc = connection closed" filter="&PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{},}"
E0919 11:11:22.237546  190330 kuberuntime_sandbox.go:292] "Failed to list pod sandboxes" err="rpc error: code = Unavailable desc = connection closed"
E0919 11:11:22.237556  190330 kubelet_pods.go:1156] "Error listing containers" err="rpc error: code = Unavailable desc = connection closed"
E0919 11:11:22.237567  190330 kubelet.go:2204] "Failed cleaning pods" err="rpc error: code = Unavailable desc = connection closed"
I0919 11:11:22.237573  190330 pod_workers.go:888] "Processing pod event" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1

The pod worker begins to execute syncTerminatingPod, and then the housekeeping process ends. Here, the pod worker and housekeeping are executed in two different goroutines.

1
2
3
4
I0919 11:11:22.237573  190330 pod_workers.go:888] "Processing pod event" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1
I0919 11:11:22.237575  190330 kubelet.go:2210] "SyncLoop (housekeeping) end"
I0919 11:11:22.237611  190330 pod_workers.go:1005] "Pod worker has observed request to terminate" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237619  190330 kubelet.go:1795] "syncTerminatingPod enter"

Stopping the container fails within syncTerminatingPod.

1
2
3
4
5
I0919 11:11:22.237686  190330 kuberuntime_container.go:723] "Killing container with a grace period" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d

6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6" gracePeriod=1
E0919 11:11:22.237712  190330 remote_runtime.go:479] "StopContainer from runtime service failed" err="rpc error: code = Unavailable desc = connection closed" containerID="24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6"
E0919 11:11:22.237752  190330 kuberuntime_container.go:728] "Container termination failed with gracePeriod" err="rpc error: code = Unavailable desc = connection closed" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6" gracePeriod=1

The pod worker completes execution, and due to the error, the pod worker’s status does not transition to terminated; it remains in the terminating state.

1
2
3
I0919 11:11:22.237854  190330 kubelet.go:1812] "syncTerminatingPod exit" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
E0919 11:11:22.237866  190330 pod_workers.go:951] "Error syncing pod, skipping" err="[failed to \"KillContainer\" for \"nginx\" with KillContainerError: \"rpc error: code = Unavailable desc = connection closed\", failed to \"KillPodSandbox\" for \"95d6b80b-77f5-4218-824e-69eec4998c22\" with KillPodSandboxError: \"rpc error: code = Unavailable desc = connection closed\"]" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237880  190330 pod_workers.go:988] "Processing pod event done" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1

Once again, housekeeping is triggered, but this time the PodWorker is not removed.

1
2
3
4
5
6
7
8
9
I1124 05:49:00.230558  190330 kubelet.go:2202] "SyncLoop (housekeeping)"
I1124 05:49:00.241674  190330 kubelet_pods.go:1082] "Clean up pod workers for terminated pods"
I1124 05:49:00.241725  190330 pod_workers.go:1251] "Pod worker has been requested for removal but is still not fully terminated" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I1124 05:49:00.241772  190330 kubelet_pods.go:1111] "Clean up probes for terminated pods"
I1124 05:49:00.241796  190330 kubelet_pods.go:1148] "Clean up orphaned pod statuses"
I1124 05:49:00.244039  190330 kubelet_pods.go:1167] "Clean up orphaned pod directories"
I1124 05:49:00.244247  190330 kubelet_pods.go:1178] "Clean up orphaned mirror pods"
I1124 05:49:00.244258  190330 kubelet_pods.go:1185] "Clean up orphaned pod cgroups"
I1124 05:49:00.244278  190330 kubelet.go:2210] "SyncLoop (housekeeping) end"

After analyzing the logs, several issues have surfaced:

  1. Why wasn’t the PodWorker removed in subsequent housekeeping?
  2. Why did ListPodSandbox fail in housekeeping?
  3. Why wasn’t the sandbox removed?

During housekeeping, the following sequence is executed, and when a pod has already been removed but still has a PodWorker, removeTerminatedWorker is executed.

1
housekeeping --> (*kubelet).HandlePodCleanups --> (*podWorkers).SyncKnownPods --> (*podWorkers).removeTerminatedWorker

When the PodWorker is not in the finished state, removeTerminatedWorker returns immediately without executing the subsequent removal logic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
func (p *podWorkers) removeTerminatedWorker(uid types.UID) {
    status, ok := p.podSyncStatuses[uid]
    // If uid is not in p.podSyncStatuses, return directly
    if !ok {
        klog.V(4).InfoS("Pod worker has been requested for removal but is not a known pod", "podUID", uid)
        return
    }

    // If the pod worker has not finished (completion implies the need to remove the podWorker)
    if !status.finished {
        klog.V(4).InfoS("Pod worker has been requested for removal but is still not fully terminated", "podUID", uid)
        return
    }

    if status.restartRequested {
        klog.V(4).InfoS("Pod has been terminated but another pod with the same UID was created, remove history to allow restart", "podUID", uid)
    } else {
        klog.V(4).InfoS("Pod has been terminated and is no longer known to the kubelet, remove all history", "podUID", uid)
    }
    // Remove uid from p.podSyncStatuses
    delete(p.podSyncStatuses, uid)
    // Close the chan corresponding to uid in p.podUpdates and remove uid from p.podUpdates
    // Also remove uid from p.lastUndeliveredWorkUpdate
    p.cleanupPodUpdates(uid)

    if p.startedStaticPodsByFullname[status.fullname] == uid {
        delete(p.startedStaticPodsByFullname, status.fullname)
    }
}

For more detailed code, refer to:

housekeeping

HandlePodCleanups

SyncKnownPods

removeTerminatedWorker

After investigating the logs and status of Containerd, it was found that Containerd was consistently panicking, leading to systemd restarting Containerd.

1
2
3
4
5
6
7
Nov 11 09:13:01  containerd[385194]: fatal error: concurrent map writes
Nov 11 09:13:01  containerd[385194]: goroutine 12181 [running]:
Nov 11 09:13:01  containerd[385194]: runtime.throw({0x56512ed01267?, 0x0?})
Nov 11 09:13:01  containerd[385194]:         /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc0024fe688 sp=0xc0024fe658 pc=0x56512da81cf1
Nov 11 09:13:01  containerd[385194]: runtime.mapassign(0x56512f1282e0?, 0xc000691ec0?, 0x40?)
Nov 11 09:13:01  containerd[385194]:         /usr/local/go/src/runtime/map.go:595 +0x4d6 fp=0xc0024fe708 sp=0xc0024fe688 pc=0x56512da59e16
Nov 11 09:13:01  containerd[385194]: github.com/containerd/containerd/pkg/cri/store/sandbox.(*Store).UpdateContainerStats(0xc000691f50, {0xc0004a8e80?, 0xc000f6e140?}, 0xc000da3840)

The Containerd version in use was 1.6.15.

1
2
# dpkg -l |grep containerd
ii  containerd.io                               1.6.15-1                                amd64        An open and reliable container runtime

The root cause was Containerd’s abnormal restarts, leading to the failure of the ListPodSandbox call in housekeeping.

Upon checking, it was discovered that this Containerd bug had been fixed in version 1.6.22, specifically through PR8819. Consequently, an upgrade to Containerd version 1.6.25 was performed.

The garbage collector invokes (*containerGC).evictSandboxes to reclaim sandboxes, where the variable evictNonDeletedPods is set to false. Therefore, sandbox reclamation only occurs if cgc.podStateProvider.ShouldPodContentBeRemoved(podUID) returns true.

Code Reference: pkg/kubelet/kuberuntime/kuberuntime_gc.go#L313-L323

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
	for podUID, sandboxes := range sandboxesByPod {
		if cgc.podStateProvider.ShouldPodContentBeRemoved(podUID) || (evictNonDeletedPods && cgc.podStateProvider.ShouldPodRuntimeBeRemoved(podUID)) {
			// Remove all evictable sandboxes if the pod has been removed.
			// Note that the latest dead sandbox is also removed if there is
			// already an active one.
			cgc.removeOldestNSandboxes(sandboxes, len(sandboxes))
		} else {
			// Keep the latest one if the pod still exists.
			cgc.removeOldestNSandboxes(sandboxes, len(sandboxes)-1)
		}
	}

The ShouldPodContentBeRemoved function is implemented by podWorkers. Since the pod’s podWorker exists and is in the “terminating” state, ShouldPodContentBeRemoved returns false. Therefore, sandbox reclamation is not executed.

Code Reference: pkg/kubelet/pod_workers.go#L522-L533

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// If uid is in p.podSyncStatuses, returns true if the pod is evicted or "the pod is deleted and in the terminated state," otherwise returns false.
// Otherwise, returns true if "at least one pod worker has executed syncPod, i.e., a pod has been started via UpdatePod()," otherwise returns false.
func (p *podWorkers) ShouldPodContentBeRemoved(uid types.UID) bool {
	p.podLock.Lock()
	defer p.podLock.Unlock()
	if status, ok := p.podSyncStatuses[uid]; ok {
		return status.IsEvicted() || (status.IsDeleted() && status.IsTerminated())
	}
	// A pod that hasn't been sent to the pod worker yet should have no content on disk once we have synced all content.
	return p.podsSynced
}

Upon identifying the issue, I examined the commit history for pkg/kubelet/pod_workers.go and found that the bug had already been fixed by Clayton Coleman in this commit. The problem was resolved in version 1.27, and I also found the overlooked related issue 107730 through Google search.

To address the issue of setting terminationGracePeriodSeconds to 0, Clayton Coleman refactored the podWorker in version 1.22 with this commit. The podWorker is complex, with many edge cases, and this refactoring introduced the problem. To fix it, further small refinements were made to the podWorker, which, in turn, introduced some minor issues, leading to subsequent bug fixes.

The root cause of the problem lies in the execution of (*Kubelet).HandlePodCleanups() during the housekeeping phase. In this function, the list of pods is retrieved from the cache, and even stopped pods, which should not trigger “Clean up orphaned pod containers”, are incorrectly processed, resulting in the regeneration of podWorker.

Another condition contributing to the problem is an error occurring when calling the runtime to stop the container, causing the podWorker to be in the “terminating” state (which won’t be removed by removeTerminatedWorker).

In essence, the problem arises when obtaining non-real-time data from the cache during the housekeeping phase, coupled with an error in stopping containers via the runtime, causing the podWorker to be in the “terminating” state. This prevents the garbage collector from cleaning up the sandbox.

The code in this area has been modified to directly retrieve the running pod list from the runtime, avoiding the execution of the “Clean up orphaned pod containers” logic when it is not necessary. This change ensures that if a pod is not actively running in the runtime, it won’t have a podWorker, and its sandbox can be removed by the garbage collector.

pkg/kubelet/kubelet_pods.go in kubernetes 1.23.17

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Obtain the list of running pods from the runtime cache
runningRuntimePods, err := kl.runtimeCache.GetPods()
if err != nil {
    klog.ErrorS(err, "Error listing containers")
    return err
}
for _, runningPod := range runningRuntimePods {
    switch workerState, ok := workingPods[runningPod.ID]; {
    case ok && workerState == SyncPod, ok && workerState == TerminatingPod:
        // If the pod worker is already in charge of this pod, we don't need to do anything
        continue
    default:
        // If the pod isn't in the set that should be running and isn't already terminating, terminate now.
        // This termination is aggressive because all known pods should already be in a known state.
        // This includes pods orphaned due to kubelet restart or bugs.
        // We can safely terminate unknown pods since housekeeping blocks other config changes,
        // and we know that another pod wasn't started in the background.
        if _, ok := allPodsByUID[runningPod.ID]; !ok {
            klog.V(3).InfoS("Clean up orphaned pod containers", "podUID", runningPod.ID)
            one := int64(1)
            kl.podWorkers.UpdatePod(UpdatePodOptions{
                UpdateType: kubetypes.SyncPodKill,
                RunningPod: runningPod,
                KillPodOptions: &KillPodOptions{
                    PodTerminationGracePeriodSecondsOverride: &one,
                },
            })
        }
    }
}

The relevant code modification in release-1.27

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
	// Retrieve the list of running containers from the runtime to perform cleanup.
	// We need the latest state to avoid delaying restarts of static pods that reuse
	// a UID.
	if err := kl.runtimeCache.ForceUpdateIfOlder(ctx, kl.clock.Now()); err != nil {
		klog.ErrorS(err, "Error listing containers")
		return err
	}
	runningRuntimePods, err := kl.runtimeCache.GetPods(ctx)
	if err != nil {
		klog.ErrorS(err, "Error listing containers")
		return err
	}

Fixing termination and status pod reporting

Related Content