Home avatar

Don't forget why you set off

Why Evicted Pods are not deleted and How to cleanup

Recently, I encountered an unexpected phenomenon. After scaling down the replicas of a deployment to 0, the evicted pods were not being deleted. Typically, when there are replicas in a deployment, evicted pods are not deleted, which aligns with my expectations. This article discusses topics related to the removal of evicted pods.

Summary 2023

In 2023, I overall feel like I’m continuously struggling in adversity, but gradually seeing a glimmer of hope. It’s like climbing uphill with faltering steps, looking up to see the hilltop. I feel that I’ve accumulated some expertise in the cloud-native field, allowing me to gradually share my knowledge and thoughts while enhancing my technical influence.

A Deep Dive into HighNodeUtilization and LowNodeUtilization Plugins with Descheduler

Recently, I have been researching descheduler, primarily to address CPU hotspots on part of nodes in kubernetes clusters. This issue arises when there is a significant difference in CPU usage among nodes, despite the even distribution of pod requests across nodes. As we know, kube-scheduler is responsible for scheduling pods to nodes, while descheduler removes pods, allowing the workload controller to regenerate pods. This, in turn, triggers the pod scheduling process to allocate pods to nodes again, achieving the goal of pod rescheduling and node balancing.

The descheduler project in the community aims to address the following scenarios:

  1. Some nodes have high utilization and need to balance node utilization.
  2. After pod scheduling, nodes’ labels or taints do not meet the pod’s pod/node affinity, requiring pod relocation to compliant nodes.
  3. New nodes join the cluster, necessitating the balancing of node utilization.
  4. Pods are in a failed state but have not been cleaned up.
  5. Pods of the same workload are concentrated on the same node.

Descheduler uses a plugin mechanism to extend its capabilities, with plugins categorized into Balance (node balancing) and Deschedule (pod rescheduling) types.

Analysis the Static Pod Removal Process in kubelet

The previous article discussed the interesting removal process of the mirror pod. This article will explore the removal process of static pods.

Static pods can originate from files and HTTP services, and static pods are only visible internally to the kubelet. The mirror pod is an image of the static pod that allows external components to capture the static state.

The previous article explained that removing the mirror pod does not delete the static pod. To delete a static pod, you need to either delete the files under the --pod-manifest-path directory or remove the pod by making the HTTP server specified in --manifest-url return a response body that excludes this pod.

Exploring Mirror Pod Deletion in Kubernetes: Understanding its Impact on Static Pod

This is also an article about the research on the process of removing pods, focusing on the removal of mirror pods. The term “mirror pod” may sound unfamiliar, it is a type of pod within Kubernetes.

Let’s first introduce the classification of pods. Pods come from file, http, and apiserver sources. Pods from the apiserver are called ordinary pods, while pods from other sources are called static pods (the control plane installed using kubeadm runs with static pods). To manage pods conveniently, kubelet generates corresponding pods for static pods on the apiserver. These types of pods are called mirror pods, essentially mirroring the static pod (almost identical, except for a different UID and the addition of “kubernetes.io/config.mirror” in annotations).

So, what happens when you delete a mirror pod? Will it remove static pods on the node?

Kubelet Bug: Sandbox Not Cleaned Up - Unraveling Retention Issues

In the process of reviewing the kubelet’s removal of pods, I encountered an issue where the sandbox of a “removed” pod was not being cleaned up.

In the previous article “Analysis of ‘an error occurred when try to find container’ Errors in Kubernetes Pod Removal Processes,” we analyzed the pod removal process. The garbage collector in kubelet is responsible for cleaning up the sandbox of exited pods, but in practice, it was not being removed.

Why wasn’t this sandbox removed in the kubelet’s pod removal process? In other words, why didn’t the garbage collector clean up the exited sandbox?

This article analyzes kubelet logs and combines them with code logic to outline the entire process of pod removal, identifying the root cause of this phenomenon.

Analysis of 'an error occurred when try to find container' Errors in Kubernetes Pod Removal Processes

About 4 months ago, while troubleshooting a bug in the CNI plugin that caused pod removal failures, I examined the deletion process of Kubernetes 1.23 version pods. In the kubelet logs, I frequently encountered the error “an error occurred when try to find container,” which I had previously ignored. This time, I decided to analyze the root cause of this error.

This article will analyze the following aspects:

  1. Introduction to the core components of pod lifecycle management in kubelet.
  2. Analysis of the actual pod removal process based on the logs output during the pod removal process in kubelet.

Before we begin, if you ask me how serious this error is and what impact it has, my answer is that it doesn’t matter. This issue is caused by inconsistencies in asynchronous and cached information, and it does not affect the execution of the pod deletion and cleanup process. If you want to know the reason, continue reading; if not, you can close this article directly because it is lengthy and not suitable for troubleshooting.