深度解析Static Pod在kubelet中的移除流程
上篇文章讲了有意思的mirror pod的移除流程,这篇文章来研究static pod的移除流程。
static pod可以来自文件和HTTP服务,而且static pod只在kubelet内部可见,mirror pod是static pod的镜像让外部组件能够捕获static状态。
上篇文章讲了删除mirror pod并不会删除static pod,执行static pod的删除需要通过删除--pod-manifest-path
目录下的文件或让--manifest-url
的http server返回response body里移除这个pod。
pod移除流程系列文章
- 深入探索Kubernetes中的mirror Pod删除过程及其对static Pod 的影响
- Kubelet Bug:sandbox残留问题 - 探寻sandbox无法被清理的根源
- 为什么kubelet日志出现an error occurred when try to find container
下面研究来自文件的static pod移除流程,本文的kubernetes版本为1.23,日志级别为4。
1 分析kubelet日志
通过分析kubelet 日志并结合相应的代码,解读出static pod的移除流程。
完整日志文件在 kubelet log and watch pod output
感知到static pod配置文件的移除,这里"SyncLoop REMOVE"意味者pod消失,发送SyncPodKill类型的事件通知podWorker
I1123 14:18:35.558172 315900 kubelet.go:2124] "SyncLoop REMOVE" source="file" pods=[default/nginx-static-pod-10.11.251.2]
I1123 14:18:35.558191 315900 kubelet.go:1969] "Pod has been deleted and must be killed" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:18:35.558206 315900 pod_workers.go:638] "Pod is being removed by the kubelet, begin teardown" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
触发podWoker执行syncTerminatingPod
I1123 14:18:35.558234 315900 pod_workers.go:888] "Processing pod event" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6 updateType=1
I1123 14:18:35.558244 315900 pod_workers.go:1005] "Pod worker has observed request to terminate" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:18:35.558259 315900 kubelet.go:1795] "syncTerminatingPod enter" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
syncTerminatingPod执行stop container和sandbox
I1123 14:18:35.558456 315900 kubelet.go:1825] "Pod terminating with grace period" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6 gracePeriod=30
I1123 14:18:35.558519 315900 kuberuntime_container.go:719] "Killing container with a grace period override" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6 containerName="nginx-container" containerID="docker://b6ca55d329230c8f5776eb1160fe161d6fefa01f2b31e55dbb820add90aadccc" gracePeriod=30
I1123 14:18:35.558528 315900 kuberuntime_container.go:723] "Killing container with a grace period" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6 containerName="nginx-container"
syncTerminatingPod执行完成,podWorker执行完成
I1123 14:18:35.775202 315900 kubelet.go:1873] "Pod termination stopped all running containers" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:18:35.775212 315900 kubelet.go:1875] "syncTerminatingPod exit" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:18:35.775220 315900 pod_workers.go:1050] "Pod terminated all containers successfully" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:18:35.775232 315900 pod_workers.go:988] "Processing pod event done" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6 updateType=1
I1123 14:18:35.775237 315900 pod_workers.go:888] "Processing pod event"
podWorker开始执行syncTerminatedPod
I1123 14:18:35.775237 315900 pod_workers.go:888] "Processing pod event" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6 updateType=2
I1123 14:18:35.938222 315900 kubelet.go:1883] "syncTerminatedPod enter" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
PLEG感知container和sandbox停止,这里由于SyncPodKill类型的事件,所以podWorker没有设置delete字段为true,kl.containerDeletor.deleteContainersInPod里的removeAll参数为false(即清理策略为保留1个最后退出的容器),所以sanbox的PLEG事件触发执行cleanUpContainersInPod,会报错"Container not found in pod’s containers"。
而且因为只有一个退出的容器,所以这里没有触发清理容器动作。
I1123 14:18:35.938230 315900 kubelet.go:2156] "SyncLoop (PLEG): pod does not exist, ignore irrelevant event" event=&{ID:a8712c005851ee6b29cff91b9ab4b9c6 Type:ContainerDied Data:b6ca55d329230c8f5776eb1160fe161d6fefa01f2b31e55dbb820add90aadccc}
I1123 14:18:35.938232 315900 kubelet_pods.go:1441] "Generating pod status" pod="default/nginx-static-pod-10.11.251.2"
I1123 14:18:35.938244 315900 kubelet.go:2156] "SyncLoop (PLEG): pod does not exist, ignore irrelevant event" event=&{ID:a8712c005851ee6b29cff91b9ab4b9c6 Type:ContainerDied Data:398445f28f116ed45394c18d7697a64dceeef739379d5ac920bbf3fd6cc1bb78}
I1123 14:18:35.938251 315900 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="398445f28f116ed45394c18d7697a64dceeef739379d5ac920bbf3fd6cc1bb78"
syncTerminatedPod执行完成,podWorker执行完成
I1123 14:18:35.941374 315900 kubelet.go:1924] "syncTerminatedPod exit" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:18:35.941383 315900 pod_workers.go:1105] "Pod is complete and the worker can now stop" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:18:35.941395 315900 pod_workers.go:959] "Processing pod event done" pod="default/nginx-static-pod-10.11.251.2" podUID=a8712c005851ee6b29cff91b9ab4b9c6 updateType=2
感知到mirror pod的status更新
I1123 14:18:35.949126 315900 kubelet.go:2127] "SyncLoop RECONCILE" source="api" pods=[default/nginx-static-pod-10.11.251.2]
housekeeping触发,执行mirror pod的删除(这里设置GracePeriodSeconds为0)
I1123 14:18:37.445960 315900 kubelet.go:2202] "SyncLoop (housekeeping)"
I1123 14:18:37.448122 315900 kubelet_pods.go:1082] "Clean up pod workers for terminated pods"
I1123 14:18:37.448136 315900 pod_workers.go:1258] "Pod has been terminated and is no longer known to the kubelet, remove all history" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:18:37.448143 315900 kubelet_pods.go:1111] "Clean up probes for terminated pods"
I1123 14:18:37.451130 315900 kubelet_pods.go:1148] "Clean up orphaned pod statuses"
I1123 14:18:37.453700 315900 kubelet_pods.go:1167] "Clean up orphaned pod directories"
I1123 14:18:37.453841 315900 kubelet_volumes.go:160] "Cleaned up orphaned pod volumes dir" podUID=a8712c005851ee6b29cff91b9ab4b9c6 path="/data/kubernetes/kubelet/pods/a8712c005851ee6b29cff91b9ab4b9c6/volumes"
I1123 14:18:37.453954 315900 kubelet_volumes.go:236] "Orphaned pod found, removing" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:18:37.453970 315900 kubelet_pods.go:1178] "Clean up orphaned mirror pods"
I1123 14:18:37.453983 315900 mirror_client.go:130] "Deleting a mirror pod" pod="default/nginx-static-pod-10.11.251.2" podUID=
I1123 14:18:37.463808 315900 config.go:278] "Setting pods for source" source="api"
I1123 14:18:37.466092 315900 config.go:278] "Setting pods for source" source="api"
I1123 14:18:37.466537 315900 kubelet_pods.go:1040] "Deleted pod" podName="nginx-static-pod-10.11.251.2_default"
I1123 14:18:37.466546 315900 kubelet_pods.go:1185] "Clean up orphaned pod cgroups"
I1123 14:18:37.466560 315900 kubelet.go:2210] "SyncLoop (housekeeping) end"
感知到mirror pod删除和从apiserver上移除,由于pod已经从podManager中移除,所以不会触发podWorker执行。
I1123 14:18:37.466578 315900 kubelet.go:2130] "SyncLoop DELETE" source="api" pods=[default/nginx-static-pod-10.11.251.2]
I1123 14:18:37.466589 315900 kubelet.go:2124] "SyncLoop REMOVE" source="api" pods=[default/nginx-static-pod-10.11.251.2]
garbageCollector移除容器和sanbox,并移除pod日志目录
I1123 14:19:21.542254 315900 kuberuntime_container.go:947] "Removing container" containerID="b6ca55d329230c8f5776eb1160fe161d6fefa01f2b31e55dbb820add90aadccc"
I1123 14:19:21.542265 315900 scope.go:110] "RemoveContainer" containerID="b6ca55d329230c8f5776eb1160fe161d6fefa01f2b31e55dbb820add90aadccc"
I1123 14:19:21.554998 315900 kuberuntime_gc.go:171] "Removing sandbox" sandboxID="398445f28f116ed45394c18d7697a64dceeef739379d5ac920bbf3fd6cc1bb78"
I1123 14:19:21.563426 315900 kuberuntime_gc.go:343] "Removing pod logs" podUID=a8712c005851ee6b29cff91b9ab4b9c6
I1123 14:19:21.566388 315900 kubelet.go:1333] "Container garbage collection succeeded"
2 static pod移除流程
- podConfig感知到static文件的移除,触发SyncLoop REMOVE
- podWoker执行syncTerminatingPod(执行停止容器和sandbox)
- PLEG感知到sanbox和container的退出
- podWoker执行syncTerminatedPod(移除cgroup,更新mirror pod的status,等待pod的volume umount完成)
- 感知到mirror pod的status更新
- housekeeping触发,执行清理工作(podWorker的移除、mirror pod的删除、pod volume目录移除)
- 感知到mirror pod的删除和从apiserver上移除
- garbageCollector执行sandbox和容器的清理,并移除pod日志目录
3 总结
普通pod移除需要执行两个DELETE操作,才能从apiserver中移除。static pod移除是通过移除文件或http server的response body中移除。而static pod的对应mirror pod在housekeeping触发时候删除,退出的容器和sandbox是由garbageCollector清理。