Home avatar

Don't forget why you set off

The pod always scheduling to the same node

Encountered Strange Phenomenon: Spark-generated job pods are consistently scheduled on the same node, meaning that pods from different jobs are all being scheduled to the same node. This results in an uneven distribution of pods, even though the nodes have no taints, and their resource availability is similar. The jobs do not have any nodeSelector, nodeAffinity, nodeName, or PodTopologySpread.

Resource Recommendation Algorithms for Crane and VPA

Introduction to VPA

VPA, short for Vertical Pod Autoscaler, is an open-source implementation based on the Google paper Autopilot: Workload Autoscaling at Google Scale. It recommends container resource requests based on historical monitoring data from the containers within pods. In other words, VPA scales by directly modifying the resource requests (and limits, if configured in VPA resources) within the pod.

Key Benefits:

  1. Increases node resource utilization.
  2. Suitable for long-running, homogeneous applications.

Limitations:

modify the contents of /etc/resolv.conf when the pod is running

Kubernetes provides a method to modify the configuration of the /etc/resolv.conf file for pods using the spec.dnsConfig and spec.dnsPolicy fields. You can find specific information on this in the Customizing DNS Service documentation. However, this approach leads to the recreation of pods.

In our specific business scenario, we need pods to use local DNS instead of the centralized CoreDNS, even for pods created before the change in cluster DNS configuration. We need to update the nameserver for these existing pods to point to the local DNS server. However, we cannot actively delete pods or restart containers. This practice is not considered ideal in container usage, but it aligns with our company culture, as the business application doesn’t support graceful termination.

many pod probes are failing after upgrading Kubernetes

Background

After upgrading the Kubernetes cluster from version 1.18 to 1.23, many previously running pods are experiencing restarts due to liveness probe failures, and some are failing readiness probes as well.

Initially, it was suspected that the pod restarts were not caused by the Kubernetes upgrade, as the container hash algorithm remained unchanged between versions 1.18 and 1.23. Therefore, upgrading kubelet should not result in the regeneration of already running containers. Further investigation revealed that the timing of pod restarts occurred after the Kubernetes upgrade, but it was not directly caused by the upgrade itself. Thus, it was possible to rule out kubelet upgrade as the cause of the restarts.

pod can not access the service loadbalance ip

Recently, while developing a CNI network plugin, I encountered an issue where containers couldn’t access the LoadBalancer IP of a service. However, access to pod, service, and node addresses was functioning correctly.

  1. The network plugin is based on the cloud provider’s Elastic Network Interface (ENI), with multiple pods sharing a single ENI in a policy routing mode. This setup directs pod traffic in and out through the ENI.
  2. The primary network interface is bound to an IP, while the secondary network interfaces (ENIs) do not have IP bindings. The IPs of the pods are assigned to the secondary network interfaces.
  3. kube-proxy binds the LoadBalancer IP to the kube-ipvs0 network interface, adds IPVS rules for LoadBalancer IP forwarding, and sets up SNAT rules in iptables to allow pod IPs to access the LoadBalancer IP.

When attempting to access the LoadBalancer address 10.12.115.101:80 from within a container, connectivity issues occur:

Troubleshooting kubelet no container metric issues

Background: There are two versions of Kubernetes, 1.18.20 and 1.21.8. The kubelet is started using systemd with the same startup parameters. In Kubernetes v1.21, there is no container-related data, while in v1.18, there is monitoring data related to containers. The operating system is CentOS 7.

Observation: In the v1.21 cluster, the Grafana monitoring graphs for pods do not display any data related to containers, such as CPU and memory usage.

Exploring Path-Based and Header-Based Routing Solutions in Knative

Currently, in Knative version 1.1.0, traffic routing is primarily based on domain names.

However, in many use cases:

  1. Services often have fixed external domain names, and there may be multiple of them.
  2. Services are typically organized under specific paths within a domain, meaning one domain consists of multiple services.
  3. Grey releases are based on multiple header values with and and or relationships.

Let’s discuss how to address these requirements.

Apart from using Knative’s default domain, Knative provides domainMapping to bind additional domains.