/avatar.webp

Best Practices for Cost Optimization in Kubernetes

In the context of reducing costs and increasing efficiency, the concept of FinOps aligns well with this demand. FinOps is a best practice methodology that combines financial, technical, and business aspects, aiming to optimize the cost, performance, and value of cloud computing resources. The goal of FinOps is to enable organizations to better understand, control, and optimize cloud computing costs through prudent resource management and financial decision-making.

The stages of FinOps are divided into Cost Observation (Inform), Cost Analysis (Recommend), and Cost Optimization (Operate).

Typically, an enterprise’s internal cost platform includes cost observation and cost analysis, analyzing IT costs (cloud provider bills) based on service types and business departments.

Cost Optimization (Operate) is divided into three stages from easy to difficult:

  1. Handling idle machines and services, making informed choices on services and resources, including appropriate instance types, pricing models, reserved capacity, and package discounts.
  2. Applying service downsizing, reducing redundant resources (changing from triple-active to dual-active, dual-active to cold standby, dual-active to single-active, personnel optimization).
  3. Technical optimization (improving utilization).

This article focuses on the technical optimization stage of cost reduction, specifically in the context of cloud-native cost reduction strategies under Kubernetes.

Summary 2023

In 2023, I overall feel like I’m continuously struggling in adversity, but gradually seeing a glimmer of hope. It’s like climbing uphill with faltering steps, looking up to see the hilltop. I feel that I’ve accumulated some expertise in the cloud-native field, allowing me to gradually share my knowledge and thoughts while enhancing my technical influence.

A Deep Dive into HighNodeUtilization and LowNodeUtilization Plugins with Descheduler

Recently, I have been researching descheduler, primarily to address CPU hotspots on part of nodes in kubernetes clusters. This issue arises when there is a significant difference in CPU usage among nodes, despite the even distribution of pod requests across nodes. As we know, kube-scheduler is responsible for scheduling pods to nodes, while descheduler removes pods, allowing the workload controller to regenerate pods. This, in turn, triggers the pod scheduling process to allocate pods to nodes again, achieving the goal of pod rescheduling and node balancing.

The descheduler project in the community aims to address the following scenarios:

  1. Some nodes have high utilization and need to balance node utilization.
  2. After pod scheduling, nodes’ labels or taints do not meet the pod’s pod/node affinity, requiring pod relocation to compliant nodes.
  3. New nodes join the cluster, necessitating the balancing of node utilization.
  4. Pods are in a failed state but have not been cleaned up.
  5. Pods of the same workload are concentrated on the same node.

Descheduler uses a plugin mechanism to extend its capabilities, with plugins categorized into Balance (node balancing) and Deschedule (pod rescheduling) types.

Analysis the Static Pod Removal Process in kubelet

The previous article discussed the interesting removal process of the mirror pod. This article will explore the removal process of static pods.

Static pods can originate from files and HTTP services, and static pods are only visible internally to the kubelet. The mirror pod is an image of the static pod that allows external components to capture the static state.

The previous article explained that removing the mirror pod does not delete the static pod. To delete a static pod, you need to either delete the files under the --pod-manifest-path directory or remove the pod by making the HTTP server specified in --manifest-url return a response body that excludes this pod.

Exploring Mirror Pod Deletion in Kubernetes: Understanding its Impact on Static Pod

This is also an article about the research on the process of removing pods, focusing on the removal of mirror pods. The term “mirror pod” may sound unfamiliar, it is a type of pod within Kubernetes.

Let’s first introduce the classification of pods. Pods come from file, http, and apiserver sources. Pods from the apiserver are called ordinary pods, while pods from other sources are called static pods (the control plane installed using kubeadm runs with static pods). To manage pods conveniently, kubelet generates corresponding pods for static pods on the apiserver. These types of pods are called mirror pods, essentially mirroring the static pod (almost identical, except for a different UID and the addition of “kubernetes.io/config.mirror” in annotations).

So, what happens when you delete a mirror pod? Will it remove static pods on the node?

Kubelet Bug: Sandbox Not Cleaned Up - Unraveling Retention Issues

In the process of reviewing the kubelet’s removal of pods, I encountered an issue where the sandbox of a “removed” pod was not being cleaned up.

In the previous article “Analysis of ‘an error occurred when try to find container’ Errors in Kubernetes Pod Removal Processes,” we analyzed the pod removal process. The garbage collector in kubelet is responsible for cleaning up the sandbox of exited pods, but in practice, it was not being removed.

Why wasn’t this sandbox removed in the kubelet’s pod removal process? In other words, why didn’t the garbage collector clean up the exited sandbox?

This article analyzes kubelet logs and combines them with code logic to outline the entire process of pod removal, identifying the root cause of this phenomenon.