Common Issues
Services start failing to resolve
If the CloudOps for Kubernetes Jenkins and Nexus DNS hostnames stop resolving, this might be because the kube-proxy DaemonSet is unhealthy.
To check the status of the DaemonSet through the kubectl command:
kubectl --namespace kube-system describe daemonsets kube-proxy
You can also check the status of the kube-proxy DaemonSet in the kube-system namespace through the Kubernetes Dashboard.
- To access the Kubernetes Dashboard, see Accessing Kubernetes Dashboard.
DaemonSets in the Kubernetes cluster are failing
If DaemonSets are failing the Kubernetes cluster and the logs in the pods have the error:
Ensure that you check the resource reservation in the Kubelet configuration. The failures might be because the kubelet has run out of memory and started stealing from the DaemonSets.
To see the Kubelet configuration in a node,
Choose a node to inspect. In this example, the name of this node is referred to as NODE_NAME.
In one tab in the command terminal start the kubectl proxy.
kubectl proxy --port=8001
Run the following command to get the configuration from the configuration endpoint:
note
The resources reserved for the Kubelet will be in the JSON object
kubeReserved
.NODE_NAME="the-name-of-the-node-you-are-inspecting"; curl -sSL "http://localhost:8001/api/v1/nodes/${NODE_NAME}/proxy/configz" | jq '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"'
To see and compare the resources used by the DaemonSets in the node with the Kubelet configuration resources, do the following steps:
Run the following command with the same node used in the previous steps:
kubectl describe node ${NODE_NAME}
Running Out of Ephemeral Storage
Pods can run out of ephemeral storage when building large Self Managed Commerce deployment packages, depending on the size of the source code.
A typical message relating to this error in the build-deployment-package
Jenkins job would have the following in the job logs:
default/jenkins-worker-51db7930-910b-4888-b88e-eeb68b987803-9wjfl-4k9ms Pod just failed (Reason: Evicted, Message: Pod ephemeral local storage usage exceeds the total limit of containers 13322Mi. )
When you experience this error, review the podYaml
definition in the Jenkinsfile for the Jenkins job that is failing:
def podYamlFromFile = new File("${env.JENKINS_HOME}/workspace/${env.JOB_NAME}@script/cloudops-for-kubernetes/jenkins/agents/kubernetes/maven-5gb-2core-1container.yaml").text.trim();
String podYaml = podYamlFromFile.replace('${dockerRegistryAddress}', "${dockerRegistryAddress}").replace('${jenkinsAgentImageTag}', "${jenkinsAgentImageTag}")
For this error, edit the maven-5gb-2core-1container.yaml
file.
Open the yaml
file in any text editor and change the ephemeral storage resource definitions that are defined:
resources:
requests:
memory: "5632Mi"
cpu: "2"
ephemeral-storage: "13Gi"
limits:
memory: "5632Mi"
cpu: "2"
ephemeral-storage: "13Gi"
Add as much ephemeral storage as you require. For example, we need to set the requests and limits at 20Gi
:
resources:
requests:
memory: "5632Mi"
cpu: "2"
ephemeral-storage: "20Gi"
limits:
memory: "5632Mi"
cpu: "2"
ephemeral-storage: "20Gi"
Use Git to commit and push your changes to your CloudOps for Kubernetes repository. You can now use the changes that you made in your repository in the failing Jenkins job. Continue this process until you find the right limits for the job that is failing.
Jenkins will not start
If the CloudOps for Kubernetes Jenkins pod will not start, it could be due to incompatible Jenkins plugins. To reset the Jenkins plugins to their default state, perform the following steps:
note
This procedure will remove any custom Jenkins plugins and related configurations that may be installed in your environment,
and only Jenkins plugins shipped with CloudOps for Kubernetes will remain.
Edit the
docker-compose.override.yml
file that you used when you initially set up the cluster.Set the value for
TF_VAR_jenkins_overwrite_plugins
totrue
note
This variable must be added to the
docker-compose.override.yml
if it is not already present.Read through the
docker-compose.override.yml
file and confirm all other configurations are correct for your environment.Run the Docker Compose command
build
to build the Docker image and update the CloudOps for Kubernetes cluster.Edit
docker-compose.override.yml
and reset the value forTF_VAR_jenkins_overwrite_plugins
tofalse