Why Evicted Pods are not deleted and How to cleanup
Recently, I encountered an unexpected phenomenon. After scaling down the replicas of a deployment to 0, the evicted pods were not being deleted. Typically, when there are replicas in a deployment, evicted pods are not deleted, which aligns with my expectations. This article discusses topics related to the removal of evicted pods.
The Kubernetes version used in this article is 1.23.
Here, I will reproduce the scenario where the replicas of a deployment are set to 0, but the evicted pods are not deleted.
- Define a deployment with an ephemeral-storage limit of 200M:
- Write a 300M file inside the container of the pod and wait for the pod to be evicted:
- Check the pod:
- Scale the deployment down to 0:
- Check the pod and deployment:
A pod with
status.phase set to Failed and
status.reason set to Evicted is referred to as an evicted pod. Its IP has been released, but it still appears in
status.podIP, leading to the possibility of multiple pods sharing the same IP. Such pods are evicted by kubelet rather than being evicted through API server actions like
There are two situations in which pods are evicted by kubelet:
- The pod exceeds the specified resource limits (e.g., the container’s disk usage surpasses the
- If the remaining resources on the node fall below the values set by
--eviction-soft, kubelet will evict pods on that node.
Certainly, the direct method is to use
kubectl delete for removal.
delete all evicted pods in cluster
Are there any other ways to delete evicted pods? Does Executing a Rollout Update on a Deployment Remove Evicted Pods? Does Deleting the Replicaset Corresponding to Evicted Pods Remove Them?
With these questions in mind, let’s find answers through practical experiments.
Yes, it is possible. This is because the
--cascade option in
kubectl delete is set to “Background,” meaning that
kubectl will first delete the replicaset. Subsequently, the kube-controller-manager’s generic garbage collector will remove all pods (with
ownerReference pointing to the deleted replicaset).
Yes, it is possible, but you need to execute
rollout update 1 to
spec.revisionHistoryLimit times until the replicaset corresponding to the evicted pod is deleted. The
spec.revisionHistoryLimit determines how many replicasets are retained. When the number of replicasets under a deployment, excluding the current version, exceeds this limit, the oldest replicaset is deleted. Therefore, when the replicaset corresponding to the evicted pod is the oldest, the evicted pod will be deleted along with the replicaset.
Evicted pods are generally not deleted immediately. They persist until the number of such pods exceeds the
--terminated-pod-gc-threshold (default value is 12500). Only then will the
pod-garbage-collector controller in
kube-controller-manager delete them. In other words, the
pod-garbage-collector controller will execute deletion operations only when the number of pods with the phase Failed or Succeeded surpasses the
--terminated-pod-gc-threshold in the cluster.
Setting the replicas of a deployment to 0 merely adjusts the replicas of the current version of the replicaset to 0 without deleting the replicaset. Therefore, evicted pods are not deleted in this scenario.
status.availableReplicas of a ReplicaSet does not include deleted pods or pods with a phase of Failed or Succeeded. Since the phase of evicted pods is Failed, they are ignored. In other words, a ReplicaSet counts only the active pods it controls, excluding deleted pods and those with a phase of Failed or Succeeded.
filteredPods represents the list of pods controlled by the ReplicaSet. The
controller.FilterActivePods function filters out all inactive pods (deleted or with a phase of Failed or Succeeded).
Methods to delete evicted pods:
- Directly delete the evicted pod.
- For a deployment, you can delete the replicaset corresponding to the pod or directly delete the deployment (not recommended unless replicas are set to 0).
- For a deployment, trigger multiple
rollout updateoperations to allow the deployment controller to delete the replicaset corresponding to the evicted pod.
kube-controller-managerto a smaller value to more easily trigger the
pod-garbage-collectorcontroller to delete pods with a phase of Failed or Succeeded.