Expand application and infrastructure observability with operational insights into Kubernetes pods

Published July 7, 2020 Updated June 9, 2021 5 min read

Alois Mayr

Dynatrace is the only Kubernetes monitoring solution that provides continuous automation and full-stack advanced observability without changing code, container images, or deployments. By deploying OneAgent via the OneAgent Operator in Kubernetes environments, application teams can understand and ensure that their applications deliver the expected value and performance.

Get instant operational insights into your Kubernetes pods and namespaces

In Kubernetes environments, operating and successfully running your production applications and microservices requires getting additional insights into your Kubernetes infrastructure including the cluster, nodes, and pods that encapsulate and run the apps. In order to understand if your workloads run as expected you might want to make sure that you don’t have any pods that are stuck in pending phase. One reason for pending pods might be that you’ve run short of resources due to configured namespace resource quotas. Therefore, you should also track the resource allocation of pods per namespaces to find out which pods and namespaces allocate the most CPU or memory resources from your cluster.

With the release of Dynatrace version 1.196 we’ve extended our full-stack Kubernetes workload and infrastructure observability with a focus on pods and the use of namespaces. This will help your application and platform operations teams to understand if Kubernetes manages your workloads properly, including optimal resource allocations:

Easily identify failed and pending pods by understanding the pod phases of your workloads
Find the most resource intensive pods and be proactively alerted if they aren’t running as expected
Manage visibility into your observability data based on Kubernetes labels

Easily identify failed and pending pods by understanding the pod phases of your workloads

When users deploy workloads on Kubernetes environments the resulting pods might run through different phases in their lifecycle. The possible values for pod phases are Pending, Running, Succeeded, Failed, and Unknown.

A pending pod is one that’s already available in the Kubernetes system but not yet successfully running on a node.
A pod is in running phase when all containers in the pod have been created on a node; they don’t necessarily need to to be running to be successful though. As long as one of multiple containers is running or starting, the pod is in running phase.
A pod’s phase transitions to failed if all containers exit with failure and the pod’s restart policy hasn’t restarted any of the containers.
A pod’s phase transitions to succeeded when all containers exit with success and the restart policy doesn’t restart any containers.

Getting an overview and identifying which pods are in a failed state or are stuck in pending phase allows you to take action and double check why pods are in a non-running state and if the performance of deployed applications is affected.

Expand application and infrastructure observability

Find the most resource intensive pods and be proactively alerted if they aren’t running as expected

In addition to the existing Kubernetes metrics we’ve introduced a number of new metrics you might want to use in your dashboards. You can use these metrics to track the health and resource usage of pods, namespaces, and containers, and to define custom alerts based on these entities.

Category	Metric
Workload	Pods [Pod phase, Reason]
Pod	CPU limit
Pod	CPU requests
Pod	Memory limit
Pod	Memory request
Namespace	CPU limit
Namespace	CPU requests
Namespace	Memory limit
Namespace	Memory request
Namespace Quota	CPU limit quota [Resource quota name]
Namespace Quota	CPU limit quota used [Resource quota name]
Namespace Quota	CPU requests quota [Resource quota name]
Namespace Quota	CPU requests quota used [Resource quota name]
Namespace Quota	Memory limit quota [Resource quota name]
Namespace Quota	Memory limit quota used [Resource quota name]
Namespace Quota	Memory requests quota [Resource quota name]
Namespace Quota	Memory requests quota used [Resource quota name]
Namespace Quota	Pods count quota [Resource quota name]
Namespace Quota	Pods count quota used [Resource quota name]

An example overview dashboard for Kubernetes cluster and pod health might look like the following. Please note that a dashboard like the one below will be made available out-of-the-box in Dynatrace environments in a later release.

Expand application and infrastructure observability with operational insights into Kubernetes pods

While this dashboard nicely shows you an overview of the available cluster resources, pod phases and pod resource utilization, we also introduced dedicated metrics for tracking namespace resource quotas and usage.

Manage visibility into your observability data based on Kubernetes labels

Labels are a powerful concept in Kubernetes for attaching additional information to Kubernetes objects like pods, namespaces, workloads, and nodes. Labels help users to organize their objects and interact with the Kubernetes API when selecting subsets of these objects. Dynatrace imports pod labels of all processes that are monitored in a pod by a OneAgent code module. The respective pods need to have the permission to access this meta-information via the Kubernetes API which requires additional permissons in the pod’s service account.

At Dynatrace, we’re always striving to improve and help our customers to do more with less. This is why we now import all labels from workloads, pods, namespaces, and nodes into Dynatrace and make them available on the pages where you need them for filtering hosts, technologies, workloads (cloud applications), and, going forward, also services.

You can also leverage Kubernetes labels in management zones to control visibility into Kubernetes workloads and namespaces.

How to get started

These new features and metrics require:

ActiveGate version 1.195+
Dynatrace version 1.196+
The Show workloads and cloud applications setting must be enabled on the Kubernetes settings page
Latest service accounts settings for ActiveGate permissions

There’s more on the way

We have some amazing features in our pipeline, including:

Ingest of any metrics from Prometheus exporters in Kubernetes environments
Out-of-the-box dashboard for Kubernetes cluster overview
Filtering and alerting on Kubernetes events