Auto Scaling and Replicas · CloudOps for Kubernetes

Auto Scaling

Auto Scaling in CloudOps for Kubernetes is handled by two technologies:

Horizontal Pod Autoscalers (HPAs) are provided by Kubernetes as a way to scale the number of Pods (or replicas) in a deployment based on its resource utilization against metrics specified in its HorizontalPodAutoscaler definition. Every few seconds the Kubernetes control plane determines how many replicas are required. This is based on the metrics taken from the metrics API across all targeted Pods, and produces a ratio to scale the number of replicas as you require. For information about the algorithm, see Horizontal Pod Autoscaler Algorithm Details.

With the HorizontalPodAutoscaler v2 API, you can configure independent scale up and scale down behaviours in your target resources. By default, CloudOps for Kubernetes uses the default scaling behaviour.

Configuration of the HPAs is done using Terraform configuration files, which is included in the CloudOps for Kubernetes Git repository. CloudOps for Kubernetes uses this file to create HPAs when deploying Self Managed Commerce. You can enable HPAs when running the deploy-or-delete-commerce-stack Jenkins job.

For information on how to configure the scaling behaviour, see HorizontalPodAutoscaler.

Cluster Autoscaler is responsible for scaling up and down the number of nodes in the Kubernetes cluster. These nodes are EC2 instances in AWS. If an HPA scales up the replicas of Pods in a Deployment, but the nodes in the Kubernetes cluster don’t have enough resources (CPU/RAM), they are unschedulable. If the nodes don’t have enough resources to run any new Pods, the Cluster Autoscaler can detect this. Cluster Autoscaler asks the cloud in which it is running to scale up the number of nodes in the Kubernetes cluster. If Cluster Autoscaler finds that a node has no Pods running on it, it asks the cloud to scale down the number of nodes in the cluster.

Cluster Autoscaler is deployed and configured as part of the bootstrap process using Terraform configuration included in CloudOps for Kubernetes.

Replication Controllers

If a node in a Kubernetes cluster becomes unavailable, Replication Controllers in the Kubernetes control plane detect that there are less Pods running than the requested number of replicas. As a result, more Pods are scheduled to replace those lost when the node became unavailable. If the Kubernetes control plane cannot schedule a Pod onto a VM due to insufficient resources, Cluster Autoscaler detects this. The Cluster Autoscaler scales up the number of nodes in the Kubernetes cluster.

All Kubernetes Deployments in CloudOps for Kubernetes start with one replica. If you require, you can change this by editing the Terraform configuration that you use to deploy the Elastic Path applications.

Each Deployment has an associated Replication Controller.

For more information about Replication Controllers, see the official Kubernetes documentation on ReplicationController.

(Optional) Tuning Instance Overprovisioning

By default, CloudOps for Kubernetes will install the Kubernetes cluster-autoscaler and two additional Deployments. The cluster-autoscaler ensures that there are sufficient CPU cores and memory available for the existing Pods. When overprovisioning is enabled, Kubernetes will keep extra CPU and memory resources ready so any new Pods need not wait for a new node to be added to the cluster. Overprovisioning accomplishes this by creating extra Pods that have PodPriority set to -1, and configuring cluster-autoscaler to create one or more new nodes for these extra Pods.

The following resources are available to learn more about overprovisioning: