Scaling Down to Zero in Kubernetes: Exploring HPA and Ecosystem Solutions

In the previous article “Why Does HPA Scale Slowly”, an analysis of the principles and algorithms of Horizontal Pod Autoscaler (HPA) scaling was conducted, along with some solutions. One interesting topic related to HPA is “Can HPA scale down to 0”, a requirement aimed at cost reduction, and it is a common need in serverless environments.

The answer is yes. In version 1.16, the HPAScaleToZero feature gate was introduced, currently in alpha status, and it only supports HPA of object and external types. This means it allows the spec.minReplicas of object and external types of HPA to be set to 0.

Setting spec.minReplicas to 0 for other types of HPA will result in an error:

The HorizontalPodAutoscaler "nginx-deployment" is invalid: 
* spec.minReplicas: Invalid value: 0: must be greater than or equal to 1
* spec.metrics: Forbidden: must specify at least one Object or External metric to support scaling to zero replicas

Enable the HPAScaleToZero feature gate on kube-apiserver.


This limitation exists because other types of HPA have a model where each pod corresponds to a metrics indicator. This model poses issues within the HPA mechanism.

When the current replica count is 0, the HPA controller encounters division by 0 issues when calculating replicas for pods, resources, and containerResource types.

When the current replica count is 0, and metrics indicators have a value of 0, the final calculated replica count is 0 (assuming there are no division by 0 issues). This makes it impossible to scale up from 0.

This feature gate is less suitable for services with incoming traffic requests. In scenarios where scaling up from 0 is required, there is a period during which pods cannot receive and process traffic requests, leading to potential business impact.

The following discussion is from the community:

For scaling up from pod 0 firstly we have to make a change in service controller to gracefully handle the request when no pod is available. Service controller syncs and updates service status and load balancer status and its associated hosts. The traffic to Pods is directed to backend pods via load balancer, and its depends on how that works. In service controller there is no provision to check, existence of Pods.

1.To scale up form 0 we need monitor the request to the Service. 2.First request to the Service will trigger a CreatePod event. 3.Create buffer until the kube-proxy receives endpoint information for requested service. 4.Controller resolves the Service to some replication controller. 5.The replication controller manager schedules a new pod 6.The endpoints controller determines that service has a new endpoint and updates Service’s endpoints. 7.The kube-proxy receives a watch event with the new endpoint information and updates its routing table. 8.The kube-proxy services the request to the new endpoint.

(For detailed steps, refer to the provided link .)

This feature is well-suited for applications that are used solely for processing data, executing tasks, and do not provide external traffic access.

Examples include big data tasks, image generation, code compilation, report generation, and other classic serverless scenarios.

Knative’s Knative Pod Autoscaler (KPA) component supports scaling down to 0 and seamlessly supports scaling down services that handle external traffic. In scenarios where scaling up from 0 is required, the activator component holds connections during the time from pod startup to pod readiness and then forwards requests to the pod.

Keda also supports scaling down to 0, but not all scaler types are supported, such as CPU and Memory.

  • This scaler can scale to 0 only when the user defines at least one additional scaler that is not CPU or Memory (e.g., Kafka + Memory, or Prometheus + Memory) and minReplicaCount is 0.

Other community solutions: kube-hpa-scale-to-zero

Allow HPA to scale to 0

Scale to Zero With Kubernetes

In Kubernetes, how can I scale a Deployment to zero when idle

Related Content