Observability
Observability tools can provide valuable insight to the behavior of the cluster and Elastic Path services, can be a powerful tool when investigating issues, and can assist with monitoring and alerting.
Prometheus and Grafana
CloudOps for Kubernetes can optionally deploy the free Prometheus and Grafana tools in your Kubernetes cluster. Prometheus is a monitoring tool and a time-series database for various metrics, and Grafana is a visualization tool that can display the Prometheus metrics. These tools gather metrics data from the services and other components in your cluster and allow you to view and query that data. You can also monitor the environment based on the metrics.
You can use these tools for investigating issues during testing phases, and monitoring resources. However, the deployment is not intended for production use due to limitations on high availability and storage management. If you find the tools convenient and want to rely on them for production use, we recommend investigating production-grade implementations from vendors who specialize in such tools.
Enable Prometheus and Grafana
Enable Prometheus and Grafana in your cluster by setting TF_VAR_enable_prometheus
to true
in your docker-compose.overrid.yml
file. In the same file, update your Grafana credentials, set in the TF_VAR_grafana_username
and TF_VAR_grafana_password
variables. For more information about applying changes to your cluster configuration, see the Updating Cluster Configuration documentation.
Accessing Prometheus
After you deploy the Prometheus server, you can access it by completing the following steps:
Run the following command to get the pod name of the Prometheus server:
export POD_NAME=$(kubectl get pods \ --namespace prometheus \ -l "app=prometheus,component=server" \ -o jsonpath="{.items[0].metadata.name}")
Run the following command to port-forward the Prometheus server to port 9090 on your local machine:
kubectl --namespace prometheus port-forward $POD_NAME 9090
Go to
localhost:9090
in your browser.
Example Queries
The following section includes a list of example Prometheus queries that are available to help you.
Total CPU cores used by a type of container
This query allows you to see the total number of CPU cores used by all instances of a container with the same name. For example, you would use the following query for cortex
:
sum (
rate (
container_cpu_usage_seconds_total{container_name="cortex"}[5m]
)
)
Total memory used by a type of container
This query allows you to see the total amount of memory in bytes used by all instances of a container with the same name. For example, you would use the following query for cortex
:
sum (
container_memory_usage_bytes{container_name="cortex"}
)
Number of replicas in a deployment
This query allows you to see the number of replicas for a given Kubernetes Deployment. For example, you would use the following query for ep-cortex-deployment
:
kube_deployment_status_replicas_available{deployment="ep-cortex-deployment"}
Percentage of requested CPU cores in use
This query allows you to see the percentage of requested CPU cores that are actively being used by all instances of a container with the same name. For example, you would use the following query for cortex
:
sum (
rate (
container_cpu_usage_seconds_total{container_name="cortex"}[5m]
)
)
/
sum (
kube_pod_container_resource_requests_cpu_cores{container="cortex"}
)
*
100
For example, if in a moment of time:
- There is one pod with one container named
cortex
- It requests 2 CPU cores
- It is actively using 1 core
- Prometheus would show 50% utilization at this moment
In another example, in another moment of time:
- There are 2 pods with one container each named
cortex
- They each request 2 CPU cores
- Both are actively using 0.1 of a core
- Prometheus would show 5% utilization at this moment
Percentage of requested memory in use
This query allows you to see the percentage of requested memory that is actively being used by all instances of a container with the same name. For example, you would use the following query for cortex
:
sum (
container_memory_usage_bytes{container_name="cortex"}
)
/
sum (
kube_pod_container_resource_requests_memory_bytes{container="cortex"}
)
*
100