In the process of reviewing the kubelet’s removal of pods, I encountered an issue where the sandbox of a “removed” pod was not being cleaned up.
In the previous article “Analysis of ‘an error occurred when try to find container’ Errors in Kubernetes Pod Removal Processes,” we analyzed the pod removal process. The garbage collector in kubelet is responsible for cleaning up the sandbox of exited pods, but in practice, it was not being removed.
Why wasn’t this sandbox removed in the kubelet’s pod removal process? In other words, why didn’t the garbage collector clean up the exited sandbox?
This article analyzes kubelet logs and combines them with code logic to outline the entire process of pod removal, identifying the root cause of this phenomenon.
Pod Removal Process Series Articles:
The Kubernetes version in this article is 1.23.10, and the runtime is containerd and log level is 4.
There is a NotReady sandbox that has been in existence for two months, and there are no associated containers.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| # crictl pods | grep nginx-deployment-bd4476b48-fpgvc
cf8d3a590085c 2 months ago NotReady nginx-deployment-bd4476b48-fpgvc default 0 (default)
# crictl ps | grep nginx-deployment-bd4476b48-fpgvc
# crictl inspectp cf8d3a590085c
{
"status": {
"id": "cf8d3a590085cdadf382b354e6475e17501a56f7fe0996218831a2dd03109ab1",
"metadata": {
"attempt": 0,
"name": "nginx-deployment-bd4476b48-fpgvc",
"namespace": "default",
"uid": "95d6b80b-77f5-4218-824e-69eec4998c22"
},
"state": "SANDBOX_NOTREADY",
....
"info": {
"pid": 0,
"processStatus": "deleted",
"netNamespaceClosed": true,
|
NotReady indicates that this sandbox has already exited.
The kubelet has been running for three months, meaning it hasn’t been restarted within the pod’s lifecycle.
1
2
3
4
| # systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023-08-21 15:34:07 CST; 3 months 7 days ago
|
Upon inspecting the kubelet logs with a log level of 4, we found a recurring log entry:
1
| I1124 05:49:00.241725 190330 pod_workers.go:1251] "Pod worker has been requested for removal but is still not fully terminated" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
|
Fortunately, the kubelet logs at the time of pod removal are still available. Let’s begin the investigation from the start of the pod removal process.
The complete logs can be found at kubelet log.
When the podConfig detects that the pod has been deleted, the podWorker takes action.
1
2
3
4
5
| I0919 11:11:20.322601 190330 kubelet.go:2130] "SyncLoop DELETE" source="api" pods=[default/nginx-deployment-bd4476b48-fpgvc]
I0919 11:11:20.322631 190330 pod_workers.go:625] "Pod is marked for graceful deletion, begin teardown" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.322658 190330 pod_workers.go:888] "Processing pod event" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1
I0919 11:11:20.322689 190330 pod_workers.go:1005] "Pod worker has observed request to terminate" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.322700 190330 kubelet.go:1795] "syncTerminatingPod enter" pod="default/nginx-deployment-bd4476b48-fpgvc"
|
Container stops successfully.
1
2
3
4
5
6
7
| I0919 11:11:20.322893 190330 kuberuntime_container.go:719] "Killing container with a grace period override" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6" gracePeriod=30
I0919 11:11:20.322907 190330 kuberuntime_container.go:723] "Killing container with a grace period" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6" gracePeriod=30
I0919 11:11:20.323279 190330 event.go:294] "Event occurred" object="default/nginx-deployment-bd4476b48-fpgvc" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container nginx"
I0919 11:11:20.334936 190330 status_manager.go:685] "Patch status for pod" pod="default/nginx-deployment-bd4476b48-fpgvc" patch="{\"metadata\":{\"uid\":\"95d6b80b-77f5-4218-824e-69eec4998c22\"}}"
I0919 11:11:20.334951 190330 status_manager.go:692] "Status for pod is up-to-date" pod="default/nginx-deployment-bd4476b48-fpgvc" statusVersion=3
I0919 11:11:20.334961 190330 kubelet_pods.go:932] "Pod is terminated, but some containers are still running" pod="default/nginx-deployment-bd4476b48-fpgvc"
I0919 11:11:20.426026 190330 kuberuntime_container.go:732] "Container exited normally" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6"
|
PLEG detects the exit of the container.
1
| I0919 11:11:20.515262 190330 kubelet.go:2152] "SyncLoop (PLEG): event for pod" pod="default/nginx-deployment-bd4476b48-fpgvc" event=&{ID:95d6b80b-77f5-4218-824e-69eec4998c22 Type:ContainerDied Data:24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6}
|
The podWorker executes syncTerminatedPod, and the podWorker is in the terminated state.
1
2
3
4
5
| I0919 11:11:20.706809 190330 kubelet.go:1873] "Pod termination stopped all running containers" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.706823 190330 kubelet.go:1875] "syncTerminatingPod exit" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.706837 190330 pod_workers.go:1050] "Pod terminated all containers successfully" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:20.706857 190330 pod_workers.go:988] "Processing pod event done" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1
I0919 11:11:20.706866 190330 pod_workers.go:888] "Processing pod event" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=2
|
SyncLoop sync is triggered.
1
| I0919 11:11:21.232081 190330 kubelet.go:2171] "SyncLoop (SYNC) pods" total=1 pods=[default/nginx-deployment-bd4476b48-fpgvc]
|
PLEG detects the exit of the sandbox.
1
| I0919 11:11:21.519225 190330 kubelet.go:2152] "SyncLoop (PLEG): event for pod" pod="default/nginx-deployment-bd4476b48-fpgvc" event=&{ID:95d6b80b-77f5-4218-824e-69eec4998c22 Type:ContainerDied Data:cf8d3a590085cdadf382b354e6475e17501a56f7fe0996218831a2dd03109ab1}
|
The podWorker begins to execute syncTerminatedPod.
1
| I0919 11:11:21.519247 190330 kubelet.go:1883] "syncTerminatedPod enter" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
|
The container is removed.
1
2
| I0919 11:11:21.519263 190330 kuberuntime_container.go:947] "Removing container" containerID="24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6"
I0919 11:11:21.519272 190330 scope.go:110] "RemoveContainer" containerID="24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6"
|
The podWorker completes the execution of syncTerminatedPod, and the podWorker’s status becomes finished.
1
2
3
| I0919 11:11:21.525367 190330 kubelet.go:1924] "syncTerminatedPod exit" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:21.525378 190330 pod_workers.go:1105] "Pod is complete and the worker can now stop" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:21.525395 190330 pod_workers.go:959] "Processing pod event done" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=2
|
Perception of the status update of the pod.
1
| I0919 11:11:21.534874 190330 kubelet.go:2127] "SyncLoop RECONCILE" source="api" pods=[default/nginx-deployment-bd4476b48-fpgvc]
|
The pod is deleted again. Since the podWorker is in the finished state, no action is taken.
1
2
| I0919 11:11:21.546521 190330 kubelet.go:2130] "SyncLoop DELETE" source="api" pods=[default/nginx-deployment-bd4476b48-fpgvc]
I0919 11:11:21.546554 190330 pod_workers.go:611] "Pod is finished processing, no further updates" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
|
The pod is removed from the API server. Since the podWorker is in the finished state, no action is taken.
1
2
3
| I0919 11:11:21.551243 190330 kubelet.go:2124] "SyncLoop REMOVE" source="api" pods=[default/nginx-deployment-bd4476b48-fpgvc]
I0919 11:11:21.551265 190330 kubelet.go:1969] "Pod has been deleted and must be killed" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:21.551286 190330 pod_workers.go:611] "Pod is finished processing, no further updates" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
|
The housekeeping process is triggered, and the pod worker is removed, followed by the creation of a new pod worker.
1
2
3
4
5
6
7
8
| I0919 11:11:22.230603 190330 kubelet.go:2202] "SyncLoop (housekeeping)"
I0919 11:11:22.237366 190330 kubelet_pods.go:1082] "Clean up pod workers for terminated pods"
I0919 11:11:22.237397 190330 pod_workers.go:1258] "Pod has been terminated and is no longer known to the kubelet, remove all history" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237414 190330 kubelet_pods.go:1111] "Clean up probes for terminated pods"
I0919 11:11:22.237425 190330 kubelet_pods.go:1134] "Clean up orphaned pod containers" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237438 190330 pod_workers.go:571] "Pod is being synced for the first time" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237448 190330 pod_workers.go:620] "Pod is orphaned and must be torn down" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237466 190330 kubelet_pods.go:1148] "Clean up orphaned pod statuses"
|
An error occurs in the (*Kubelet).HandlePodCleanups
function during the housekeeping process, and the execution of runningRuntimePods, err = kl.containerRuntime.GetPods(false)
results in an error.
1
2
3
4
5
| E0919 11:11:22.237513 190330 remote_runtime.go:365] "ListPodSandbox with filter from runtime service failed" err="rpc error: code = Unavailable desc = connection closed" filter="&PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{},}"
E0919 11:11:22.237546 190330 kuberuntime_sandbox.go:292] "Failed to list pod sandboxes" err="rpc error: code = Unavailable desc = connection closed"
E0919 11:11:22.237556 190330 kubelet_pods.go:1156] "Error listing containers" err="rpc error: code = Unavailable desc = connection closed"
E0919 11:11:22.237567 190330 kubelet.go:2204] "Failed cleaning pods" err="rpc error: code = Unavailable desc = connection closed"
I0919 11:11:22.237573 190330 pod_workers.go:888] "Processing pod event" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1
|
The pod worker begins to execute syncTerminatingPod
, and then the housekeeping process ends. Here, the pod worker and housekeeping are executed in two different goroutines.
1
2
3
4
| I0919 11:11:22.237573 190330 pod_workers.go:888] "Processing pod event" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1
I0919 11:11:22.237575 190330 kubelet.go:2210] "SyncLoop (housekeeping) end"
I0919 11:11:22.237611 190330 pod_workers.go:1005] "Pod worker has observed request to terminate" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237619 190330 kubelet.go:1795] "syncTerminatingPod enter"
|
Stopping the container fails within syncTerminatingPod
.
1
2
3
4
5
| I0919 11:11:22.237686 190330 kuberuntime_container.go:723] "Killing container with a grace period" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d
6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6" gracePeriod=1
E0919 11:11:22.237712 190330 remote_runtime.go:479] "StopContainer from runtime service failed" err="rpc error: code = Unavailable desc = connection closed" containerID="24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6"
E0919 11:11:22.237752 190330 kuberuntime_container.go:728] "Container termination failed with gracePeriod" err="rpc error: code = Unavailable desc = connection closed" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 containerName="nginx" containerID="containerd://24bee860a677b045e22fb764067cee0dbddeaeb2ac68ccd229b26418d24cf2e6" gracePeriod=1
|
The pod worker completes execution, and due to the error, the pod worker’s status does not transition to terminated; it remains in the terminating state.
1
2
3
| I0919 11:11:22.237854 190330 kubelet.go:1812] "syncTerminatingPod exit" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
E0919 11:11:22.237866 190330 pod_workers.go:951] "Error syncing pod, skipping" err="[failed to \"KillContainer\" for \"nginx\" with KillContainerError: \"rpc error: code = Unavailable desc = connection closed\", failed to \"KillPodSandbox\" for \"95d6b80b-77f5-4218-824e-69eec4998c22\" with KillPodSandboxError: \"rpc error: code = Unavailable desc = connection closed\"]" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I0919 11:11:22.237880 190330 pod_workers.go:988] "Processing pod event done" pod="default/nginx-deployment-bd4476b48-fpgvc" podUID=95d6b80b-77f5-4218-824e-69eec4998c22 updateType=1
|
Once again, housekeeping is triggered, but this time the PodWorker is not removed.
1
2
3
4
5
6
7
8
9
| I1124 05:49:00.230558 190330 kubelet.go:2202] "SyncLoop (housekeeping)"
I1124 05:49:00.241674 190330 kubelet_pods.go:1082] "Clean up pod workers for terminated pods"
I1124 05:49:00.241725 190330 pod_workers.go:1251] "Pod worker has been requested for removal but is still not fully terminated" podUID=95d6b80b-77f5-4218-824e-69eec4998c22
I1124 05:49:00.241772 190330 kubelet_pods.go:1111] "Clean up probes for terminated pods"
I1124 05:49:00.241796 190330 kubelet_pods.go:1148] "Clean up orphaned pod statuses"
I1124 05:49:00.244039 190330 kubelet_pods.go:1167] "Clean up orphaned pod directories"
I1124 05:49:00.244247 190330 kubelet_pods.go:1178] "Clean up orphaned mirror pods"
I1124 05:49:00.244258 190330 kubelet_pods.go:1185] "Clean up orphaned pod cgroups"
I1124 05:49:00.244278 190330 kubelet.go:2210] "SyncLoop (housekeeping) end"
|
After analyzing the logs, several issues have surfaced:
- Why wasn’t the PodWorker removed in subsequent housekeeping?
- Why did ListPodSandbox fail in housekeeping?
- Why wasn’t the sandbox removed?
During housekeeping, the following sequence is executed, and when a pod has already been removed but still has a PodWorker, removeTerminatedWorker
is executed.
1
| housekeeping --> (*kubelet).HandlePodCleanups --> (*podWorkers).SyncKnownPods --> (*podWorkers).removeTerminatedWorker
|
When the PodWorker is not in the finished
state, removeTerminatedWorker
returns immediately without executing the subsequent removal logic.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| func (p *podWorkers) removeTerminatedWorker(uid types.UID) {
status, ok := p.podSyncStatuses[uid]
// If uid is not in p.podSyncStatuses, return directly
if !ok {
klog.V(4).InfoS("Pod worker has been requested for removal but is not a known pod", "podUID", uid)
return
}
// If the pod worker has not finished (completion implies the need to remove the podWorker)
if !status.finished {
klog.V(4).InfoS("Pod worker has been requested for removal but is still not fully terminated", "podUID", uid)
return
}
if status.restartRequested {
klog.V(4).InfoS("Pod has been terminated but another pod with the same UID was created, remove history to allow restart", "podUID", uid)
} else {
klog.V(4).InfoS("Pod has been terminated and is no longer known to the kubelet, remove all history", "podUID", uid)
}
// Remove uid from p.podSyncStatuses
delete(p.podSyncStatuses, uid)
// Close the chan corresponding to uid in p.podUpdates and remove uid from p.podUpdates
// Also remove uid from p.lastUndeliveredWorkUpdate
p.cleanupPodUpdates(uid)
if p.startedStaticPodsByFullname[status.fullname] == uid {
delete(p.startedStaticPodsByFullname, status.fullname)
}
}
|
For more detailed code, refer to:
housekeeping
HandlePodCleanups
SyncKnownPods
removeTerminatedWorker
3.2 Why did ListPodSandbox fail in housekeeping?
After investigating the logs and status of Containerd, it was found that Containerd was consistently panicking, leading to systemd restarting Containerd.
1
2
3
4
5
6
7
| Nov 11 09:13:01 containerd[385194]: fatal error: concurrent map writes
Nov 11 09:13:01 containerd[385194]: goroutine 12181 [running]:
Nov 11 09:13:01 containerd[385194]: runtime.throw({0x56512ed01267?, 0x0?})
Nov 11 09:13:01 containerd[385194]: /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc0024fe688 sp=0xc0024fe658 pc=0x56512da81cf1
Nov 11 09:13:01 containerd[385194]: runtime.mapassign(0x56512f1282e0?, 0xc000691ec0?, 0x40?)
Nov 11 09:13:01 containerd[385194]: /usr/local/go/src/runtime/map.go:595 +0x4d6 fp=0xc0024fe708 sp=0xc0024fe688 pc=0x56512da59e16
Nov 11 09:13:01 containerd[385194]: github.com/containerd/containerd/pkg/cri/store/sandbox.(*Store).UpdateContainerStats(0xc000691f50, {0xc0004a8e80?, 0xc000f6e140?}, 0xc000da3840)
|
The Containerd version in use was 1.6.15.
1
2
| # dpkg -l |grep containerd
ii containerd.io 1.6.15-1 amd64 An open and reliable container runtime
|
The root cause was Containerd’s abnormal restarts, leading to the failure of the ListPodSandbox call in housekeeping.
Upon checking, it was discovered that this Containerd bug had been fixed in version 1.6.22, specifically through PR8819. Consequently, an upgrade to Containerd version 1.6.25 was performed.
3.3 Why wasn’t the sandbox removed?
The garbage collector invokes (*containerGC).evictSandboxes
to reclaim sandboxes, where the variable evictNonDeletedPods
is set to false. Therefore, sandbox reclamation only occurs if cgc.podStateProvider.ShouldPodContentBeRemoved(podUID)
returns true.
Code Reference: pkg/kubelet/kuberuntime/kuberuntime_gc.go#L313-L323
1
2
3
4
5
6
7
8
9
10
11
| for podUID, sandboxes := range sandboxesByPod {
if cgc.podStateProvider.ShouldPodContentBeRemoved(podUID) || (evictNonDeletedPods && cgc.podStateProvider.ShouldPodRuntimeBeRemoved(podUID)) {
// Remove all evictable sandboxes if the pod has been removed.
// Note that the latest dead sandbox is also removed if there is
// already an active one.
cgc.removeOldestNSandboxes(sandboxes, len(sandboxes))
} else {
// Keep the latest one if the pod still exists.
cgc.removeOldestNSandboxes(sandboxes, len(sandboxes)-1)
}
}
|
The ShouldPodContentBeRemoved
function is implemented by podWorkers
. Since the pod’s podWorker
exists and is in the “terminating” state, ShouldPodContentBeRemoved
returns false. Therefore, sandbox reclamation is not executed.
Code Reference: pkg/kubelet/pod_workers.go#L522-L533
1
2
3
4
5
6
7
8
9
10
11
| // If uid is in p.podSyncStatuses, returns true if the pod is evicted or "the pod is deleted and in the terminated state," otherwise returns false.
// Otherwise, returns true if "at least one pod worker has executed syncPod, i.e., a pod has been started via UpdatePod()," otherwise returns false.
func (p *podWorkers) ShouldPodContentBeRemoved(uid types.UID) bool {
p.podLock.Lock()
defer p.podLock.Unlock()
if status, ok := p.podSyncStatuses[uid]; ok {
return status.IsEvicted() || (status.IsDeleted() && status.IsTerminated())
}
// A pod that hasn't been sent to the pod worker yet should have no content on disk once we have synced all content.
return p.podsSynced
}
|
Upon identifying the issue, I examined the commit history for pkg/kubelet/pod_workers.go and found that the bug had already been fixed by Clayton Coleman in this commit. The problem was resolved in version 1.27, and I also found the overlooked related issue 107730 through Google search.
To address the issue of setting terminationGracePeriodSeconds
to 0, Clayton Coleman refactored the podWorker
in version 1.22 with this commit. The podWorker
is complex, with many edge cases, and this refactoring introduced the problem. To fix it, further small refinements were made to the podWorker
, which, in turn, introduced some minor issues, leading to subsequent bug fixes.
The root cause of the problem lies in the execution of (*Kubelet).HandlePodCleanups()
during the housekeeping phase. In this function, the list of pods is retrieved from the cache, and even stopped pods, which should not trigger “Clean up orphaned pod containers”, are incorrectly processed, resulting in the regeneration of podWorker
.
Another condition contributing to the problem is an error occurring when calling the runtime to stop the container, causing the podWorker
to be in the “terminating” state (which won’t be removed by removeTerminatedWorker
).
In essence, the problem arises when obtaining non-real-time data from the cache during the housekeeping phase, coupled with an error in stopping containers via the runtime, causing the podWorker
to be in the “terminating” state. This prevents the garbage collector from cleaning up the sandbox.
The code in this area has been modified to directly retrieve the running pod list from the runtime, avoiding the execution of the “Clean up orphaned pod containers” logic when it is not necessary. This change ensures that if a pod is not actively running in the runtime, it won’t have a podWorker
, and its sandbox can be removed by the garbage collector.
pkg/kubelet/kubelet_pods.go in kubernetes 1.23.17
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| // Obtain the list of running pods from the runtime cache
runningRuntimePods, err := kl.runtimeCache.GetPods()
if err != nil {
klog.ErrorS(err, "Error listing containers")
return err
}
for _, runningPod := range runningRuntimePods {
switch workerState, ok := workingPods[runningPod.ID]; {
case ok && workerState == SyncPod, ok && workerState == TerminatingPod:
// If the pod worker is already in charge of this pod, we don't need to do anything
continue
default:
// If the pod isn't in the set that should be running and isn't already terminating, terminate now.
// This termination is aggressive because all known pods should already be in a known state.
// This includes pods orphaned due to kubelet restart or bugs.
// We can safely terminate unknown pods since housekeeping blocks other config changes,
// and we know that another pod wasn't started in the background.
if _, ok := allPodsByUID[runningPod.ID]; !ok {
klog.V(3).InfoS("Clean up orphaned pod containers", "podUID", runningPod.ID)
one := int64(1)
kl.podWorkers.UpdatePod(UpdatePodOptions{
UpdateType: kubetypes.SyncPodKill,
RunningPod: runningPod,
KillPodOptions: &KillPodOptions{
PodTerminationGracePeriodSecondsOverride: &one,
},
})
}
}
}
|
The relevant code modification in release-1.27
1
2
3
4
5
6
7
8
9
10
11
12
| // Retrieve the list of running containers from the runtime to perform cleanup.
// We need the latest state to avoid delaying restarts of static pods that reuse
// a UID.
if err := kl.runtimeCache.ForceUpdateIfOlder(ctx, kl.clock.Now()); err != nil {
klog.ErrorS(err, "Error listing containers")
return err
}
runningRuntimePods, err := kl.runtimeCache.GetPods(ctx)
if err != nil {
klog.ErrorS(err, "Error listing containers")
return err
}
|
Fixing termination and status pod reporting