Observability · CloudOps for Kubernetes

Observability tools can provide valuable insight to the behavior of the cluster and Elastic Path services, can be a powerful tool when investigating issues, and can assist with monitoring and alerting.

Prometheus and Grafana

CloudOps for Kubernetes can optionally deploy the free Prometheus and Grafana tools in your Kubernetes cluster. Prometheus is a monitoring tool and a time-series database for various metrics, and Grafana is a visualization tool that can display the Prometheus metrics. These tools gather metrics data from the services and other components in your cluster and allow you to view and query that data. You can also monitor the environment based on the metrics.

You can use these tools for investigating issues during testing phases, and monitoring resources. However, the deployment is not intended for production use due to limitations on high availability and storage management. If you find the tools convenient and want to rely on them for production use, we recommend investigating production-grade implementations from vendors who specialize in such tools.

Enable Prometheus and Grafana

Enable Prometheus and Grafana in your cluster by setting TF_VAR_enable_prometheus to true in your docker-compose.overrid.yml file. In the same file, update your Grafana credentials, set in the TF_VAR_grafana_username and TF_VAR_grafana_password variables. For more information about applying changes to your cluster configuration, see the Updating Cluster Configuration documentation.

Accessing Prometheus

After you deploy the Prometheus server, you can access it by completing the following steps:

Run the following command to get the pod name of the Prometheus server:

export POD_NAME=$(kubectl get pods \
  --namespace prometheus \
  -l "app=prometheus,component=server" \
  -o jsonpath="{.items[0].metadata.name}")

Run the following command to port-forward the Prometheus server to port 9090 on your local machine:
```
kubectl --namespace prometheus port-forward $POD_NAME 9090
```
Go to localhost:9090 in your browser.

Example Queries

The following section includes a list of example Prometheus queries that are available to help you.

Total CPU cores used by a type of container

This query allows you to see the total number of CPU cores used by all instances of a container with the same name. For example, you would use the following query for cortex:

sum (
  rate (
    container_cpu_usage_seconds_total{container_name="cortex"}[5m]
  )
)

Total memory used by a type of container

This query allows you to see the total amount of memory in bytes used by all instances of a container with the same name. For example, you would use the following query for cortex:

sum (
  container_memory_usage_bytes{container_name="cortex"}
)

Number of replicas in a deployment

This query allows you to see the number of replicas for a given Kubernetes Deployment. For example, you would use the following query for ep-cortex-deployment:

kube_deployment_status_replicas_available{deployment="ep-cortex-deployment"}

Percentage of requested CPU cores in use

This query allows you to see the percentage of requested CPU cores that are actively being used by all instances of a container with the same name. For example, you would use the following query for cortex:

sum (
  rate (
    container_cpu_usage_seconds_total{container_name="cortex"}[5m]
  )
)
/
sum (
  kube_pod_container_resource_requests_cpu_cores{container="cortex"}
)
*
100

For example, if in a moment of time:

There is one pod with one container named cortex
It requests 2 CPU cores
It is actively using 1 core
Prometheus would show 50% utilization at this moment

In another example, in another moment of time:

There are 2 pods with one container each named cortex
They each request 2 CPU cores
Both are actively using 0.1 of a core
Prometheus would show 5% utilization at this moment

Percentage of requested memory in use

This query allows you to see the percentage of requested memory that is actively being used by all instances of a container with the same name. For example, you would use the following query for cortex:

sum (
  container_memory_usage_bytes{container_name="cortex"}
)
/
sum  (
  kube_pod_container_resource_requests_memory_bytes{container="cortex"}
)
*
100