CloudOps for Kubernetes uses the tool
eksctl to bootstrap an EKS cluster on AWS. This tool is easier and less error-prone to use than defining an EKS cluster using Terraform.
If you use this process, you cannot use the recommended process for updating your cluster to pick up changes related to how EKS clusters are configured in newer versions of CloudOps for Kubernetes.
This process outlines how to update the EKS version of your EKS cluster, performing a rollout restart for a zero downtime Elastic Path Commerce update.
Amazon EKS runs a highly available control plane. You can only update your cluster by one minor version at a time.
These steps were used to update an EKS cluster from v1.14 to v1.15. There may be significant changes between Kubernetes versions. We recommend that you test the behavior of your applications against new Kubernetes versions before updating your clusters.
To update your EKS cluster version:
Part 1: Create the New Nodes and Nodegroups
entrypoint.shscript and set the Kubernetes version to the next minor release in your local repository. This change does not need to be committed to the repository.
docker-compose.override.yamlfile, update the
TF_VAR_bootstrap_modeparameter with the value
create-eksctl-config. Ensure that the cluster name is the name of the Kubernetes cluster that you want to update.
eksctl.yamlcontrol file by running the following commands:
rm bootstrap/eksctl.yaml docker-compose up --build
Edit the newly generated eksctl.yaml file to name new nodegroups with your preferred naming convention.
Create the new nodes and nodegroups by running the following command:
eksctl create nodegroup --config-file=bootstrap/eksctl.yaml
Part 2: Update the Deployments
Review all of the deployments on the old nodes by running the following commands:
kubectl get nodes kubectl get deployments -o wide -A
Update the efs-provisioner configurations for the new nodegroups.
For primary EKS clusters created with the docker-compose process:
- Run the docker-compose process in setup mode.
For secondary clusters:
- Rebuild the Jenkins job create-additional-kubernetes-cluster with the job parameter buildBootstrap set to false.
Taint all old nodes by running the following command for each node in the nodegroup:
kubectl taint node <NODE_NAME> oldnode:NoSchedule
Where <NODE_NAME> is the name of the nodes given in step 1 when running the command
kubectl get nodes.
Start the rollout restart of all deployments by running the following command for each deployment:
kubectl rollout restart deployment <DEPLOYMENT_NAME>
Where <DEPLOYMENT_NAME> is the name of the deployments given in step 1 when running the command
kubectl get deployments -o wide -A.
Part 3: Update the Persistent Volumes (for non-production environments only)
This part is not needed for a production environment because ActiveMQ HA supports multi-mount disks.
In the AWS web console, delete the old security groups from the ingress and egress rules in the efs-provisioner security group for the cluster you are upgrading.
Review all of the pods on the old nodes by running the following command:
kubectl get pods -o wide
Delete the MySQL and ActiveMQ pods as their persistent volumes do not support multi-mount. Run the following command for each ActiveMQ and MySQL pod:
kubectl delete pod <POD_NAME>
Where <POD_NAME> is the name of the MySQL and ActiveMQ pods given in step 2.
Start the rollout restart of the efs-provisioner deployment. Force delete the old efs-provisioner pod by running the following commands:
Kubectl rollout restart deployment efs-provisioner -n efs-provisioner kubectl delete pod --force <EFS_POVISIONER_POD> -n efs-provisioner
Part 4: Verify and Clean Up Old Nodegroups
Verify that all pods are restarted and running. Run the following commands:
kubectl get pods -A kubectl get deployments -A
Delete the old nodegroups by running the following commands:
eksctl get nodegroups --cluster <CLUSTER_NAME> eksctl delete nodegroup --cluster <CLUSTER_NAME> <CLUSTER_NODEGROUP>
eksctl delete nodegroupfor each nodegroup that you want to delete in the cluster.
In the AWS web console, update the EKS cluster to the minor version that you set in the
entrypoint.shscript to in part 1.
Re-validate that all pods are scheduled and running and that there are no remaining nodes from the old minor version by running the following commands:
kubectl version kubectl get nodes -o wide