Upgrading CloudOps for Kubernetes
Introduction
You must regularly upgrade your CloudOps for Kubernetes environment to maintain compatibility with your cloud service provider and third-party components. It is also required to ensure your Self Managed Commerce environment remains supported and supportable.
For information about the CloudOps for Kubernetes end of support dates, see Support Lifecycle.
Selecting CloudOps for Kubernetes Updates
When planning to update CloudOps for Kubernetes, the first step is selecting the level you will update to. To select your CloudOps for Kubernetes update:
- Confirm which versions of Self Managed Commerce and CloudOps for Kubernetes you are using today.
- Review the Compatibility of CloudOps for Kubernetes documentation.
- Confirm which Elastic Path
docker
releases are compatible with which CloudOps for Kubernetes releases. - Confirm that your version of Self Managed Commerce is compatible with which CloudOps for Kubernetes releases.
- Confirm which Elastic Path
- Identify the versions of Elastic Path Docker and CloudOps for Kubernetes that you will upgrade to.
- Determine which updates you must apply to get to the target versions.
important
You can only update CloudOps for Kubernetes one version at a time.
- Apply the latest patch for your current CloudOps for Kubernetes version first
- Update Elastic Path Docker and CloudOps for Kubernetes one version at a time.
For example, if you are using CloudOps for Kubernetes v2.14.x and want to update to v3.1.x, you must perform the following updates, in the order specified below:
- Update to the latest 2.14.xx patch.
- Update to the latest 3.0.x version.
- Update to the latest 3.1.x version.
Approach to Applying an Update
CloudOps for Kubernetes is often deployed in multiple AWS accounts, with production systems in one AWS account and non-production systems in one or more other AWS accounts. It is common that all accounts share common Elastic Path Docker and CloudOps for Kubernetes infrastructure code Git repositories. You must consider how to apply and test CloudOps for Kubernetes upgrades in non-production without the changes prematurely affecting production. The following introduces a high-level approach to prepare for and roll-out a CloudOps for Kubernetes update to your multiple CloudOps for Kubernetes accounts:
- Determine the Elastic Path Docker and CloudOps for Kubernetes updates that you will apply.
- Review the changes in the release and how they may impact you. Identify and make note of any required work.
- Review and consider Git branching strategies for your Elastic Path Docker and CloudOps for Kubernetes Git repositories. This is to ensure that your CloudOps for Kubernetes production account is not impacted until all updates are tested and ready for production.
- Get the source code from the selected versions of Elastic Path Docker and CloudOps for Kubernetes.
- Identify any customizations you may have made to your Elastic Path Docker and CloudOps for Kubernetes Git repositories. Merge the customizations with the updated source code.
- Take a backup of the Terraform state of your CloudOps for Kubernetes deployment.
- Apply and validate the updates in your non-production environments. Make any required changes to address any issues.
- Create and document a plan for applying the updates to your production environment.
- Follow your plan to apply the updates in your production environment.
Preparation
Review the Compatibility of CloudOps for Kubernetes documentation to confirm that your versions of Self Managed Commerce and CloudOps for Kubernetes are known to be compatible.
Review the CloudOps for Kubernetes EKS Matrix to determine if your Kubernetes version will change when you apply the update.
Review Deprecations and Removals and consider whether and when your team will be impacted.
Review the release-specific documentation and determine whether additional upgrade steps or tasks are required.
- If you are updating from release 3.2.x to 3.3.x, see Update to Version 3.3.
- If you are updating from release 3.1.x to 3.2.x, see Update to Version 3.2.
- If you are updating from release 3.0.x to 3.1.x, see Update to Version 3.1.
Review the Release Notes to determine if any changes will impact your team or your environments.
Determine if your Kubernetes nodes use a custom Amazon Machine Image (AMI). This will only be the case if you previously set
aws_eks_ami
to a custom AMI ID indocker-compose.override.yml
. If you are using a custom AMI, ensure that your AMI is updated to be compatible with the Kubernetes version. If you update the AMI, ensure you setaws_eks_ami
to your new AMI ID and setTF_VAR_rebuild_nodegroups
totrue
indocker-compose.override.yml
, to replace your existing nodes with nodes using the new AMI.important
The Kubernetes version of the nodes is determined by the contents of the AMI running on the nodes. If you use a custom AMI for your Kubernetes nodes, review the AMI to ensure that it is compatible with the Kubernetes version and the tools in CloudOps for Kubernetes. AWS and the Kubernetes project recommend using the same Kubernetes version on the control plane and nodes to ensure smooth operation.
Get the source code from the proper versions of Elastic Path Docker and CloudOps for Kubernetes.
Locate and obtain the
docker-compose.override.yml
file that you saved after completing the initial setup of the cluster, or when last updating the cluster.Take a backup of the Terraform state before proceeding with the upgrade. For more information on how to take a backup of the Terraform state, see Backing up the Terraform Remote State.
Update the Base Infrastructure
The first step when applying an update is to update the base infrastructure, which includes the primary Kubernetes cluster, networking-related services, Jenkins and Nexus. To apply updates to the base infrastructure and Kubernetes cluster, continue with the following steps:
note
Perform the following steps on the operations workstation, ideally the workstation you used for the initial setup. For more information, see the Operations Workstation documentation.
Identify any customizations you may have made to your Elastic Path Docker and CloudOps for Kubernetes Git repositories. Merge the customizations with the updated source code.
Ensure that the up-to-date
docker-compose.override.yml
file is in the root of thecloud-ops-kubernetes
Git folder on your operations workstation.Review the new
docker-compose.yml
file and identify new or changed parameters. Update your configuration indocker-compose.override.yml
as required.tip
Compare the previous
docker-compose.yml
file with the newdocker-compose.yml
file to see if any default values have changed. Comparing the files detects any deleted or new parameters.warning
Some Docker Compose configuration parameters cannot be updated. Ensure that you leave these parameters unchanged. Changing these parameters may have unintended consequences. For more information on which parameters cannot be updated, see the documentation in the comments of the
docker-compose.yml
file.Set the Docker Compose parameter
TF_VAR_bootstrap_mode
in thedocker-compose.override.yml
file tosetup
.If your Kubernetes nodes use a custom Amazon Machine Image (AMI) and you are updating the image, set
aws_eks_ami
to your new AMI ID.If the Kubernetes version is changing, be sure to set the Docker Compose parameter
TF_VAR_rebuild_nodegroups
in thedocker-compose.override.yml
file totrue
. If the Kubernetes version is not changing, this is still recommended because it ensures that your nodes are using the most up-to-date Amazon Machine Image (AMI).warning
There will likely be a Self Managed Commerce service outage while the EKS node groups are rebuilt. A Self Managed Commerce service outage duration of up to 15 minutes has been observed in our testing.
The Jenkins server will be unavailable for a period of time while the EKS node groups are rebuilt. Any Jenkins jobs running during this time will fail.
Perform the upgrade during a low-traffic time to minimize the impact.
important
AWS and the Kubernetes project recommend maintaining the same Kubernetes version on both your control plane and on your nodes (data plane). Both the control plane and the nodes should run the same Kubernetes version to ensure smooth operation. Setting
TF_VAR_rebuild_nodegroups
totrue
replaces your old nodes with nodes that use the correct Kubernetes version.For more information about why this is important, refer to Amazon EKS version support in the Amazon EKS documentation and Best Practices for Cluster Upgrades in the EKS Best Practices Guides.
Run the Docker Compose command
build
to build the Docker image, with the--no-cache
option to ensure that all dependencies are updated:docker-compose build --no-cache
Run the Docker Compose command
up
to update the CloudOps for Kubernetes cluster:docker-compose up
Save the updated
docker-compose.override.yml
file and any dependencies, such as TLS keys, in a safe place.important
You will need the
docker-compose.override.yml
file and any dependencies in the future, to perform the following:- Update the cluster
- Show the current state of the cluster
- Create local Terraform configuration
- Clean up the cluster
warning
The
docker-compose.override.yml
file contains secrets. Make sure to securely store the file so it is only accessible to those with the necessary business need.
Complete Additional Steps
note
Review the release specific documentation to determine if additional steps are required:
- If you are updating from release 3.2.x to 3.3.x, see Update to Version 3.3.
- If you are updating from release 3.1.x to 3.2.x, see Update to Version 3.2.
- If you are updating from release 3.0.x to 3.1.x, see Update to Version 3.1.
Review Jenkins Plugin Status
Occasionally troubles with Jenkins plugin updates have been reported, where the upgrade process attempts to upgrade Jenkins plugins but there are issues applying the changes. As a precaution review the Jenkins plugin status to identify and resolve any issues.
- Navigate to the ’Manage Jenkins’ page in the provided Jenkins instance.
- On the ’Manage Jenkins’ page, note whether any plugin issues are highlighted. If there are issues they will be mentioned in a section near the top of the page, with an explanation like "Some plugins could not be loaded due to unsatisfied dependencies".
- If there are no issues identified, then there is no action to take.
- If an issue is identified then proceed to resolve it. Jenkins may provide a button labeled ’Correct’ to resolve the issue.
Ensure that the Jenkins Agents have been Rebuilt
The Jenkins agents in the provided Jenkins instance should be rebuilt before you run any Jenkins jobs. Check the status of the build-jenkins-agents
Jenkins job to ensure that the job has run successfully at least once since completing the upgrade. If not, run the job and ensure it complete successfuly.
Update the Jenkins Jobs
The Jenkins jobs in the provided Jenkins need to be updated. To update the jobs, log into the Jenkins instance and run the bootstrap
Jenkins job.