We’re happy to announce the General Availability (GA) of OpenStack monitoring with Dynatrace, bringing our long Early Access Program (EAP) (which began in February 2017) and analysis of customer requirements to a close. The Dynatrace OpenStack monitoring solution is GA as of the Dynatrace version 1.162 release and the OneAgent version 1.161 release.
This blog post shows you how to get the most value out of Dynatrace monitoring when you use the OpenStack cloud to provision infrastructure components.
Enable full-stack monitoring of OpenStack environments easily
The structure of this article follows the discovery path, from application performance and availability monitoring, through the monitoring of underlying services, all the way to the supporting infrastructure and its management. In true Dynatrace fashion, we’ve placed our bets on ease of deployment and automation of discovery. Your exploration and level of awareness of your OpenStack-managed environment is dependent only on your decision as to how deep you want to drill down, and in particular, your decision about placement of OneAgents within your monitored environment.
Let’s consider the journey through and exploration of different levels of OpenStack awareness in these simple steps, each provides additional insight into OpenStack infrastructure:
- Understanding that your host is a virtual machine that is managed by OpenStack
- Learning about the parent OpenStack compute node
- Analyzing the health of the OpenStack compute nodes, checking which other VMs they host, and understanding how resources are distributed among them
- Looking into the health of OpenStack controller nodes and their services, and analyzing the performance of some of the underlying technologies
- Using Dynatrace Log Analytics to capture essential performance and availability problems directly from the OpenStack services
- Potential improvements and further steps
Step 1: OneAgent on VMs managed by OpenStack—awareness of the OpenStack as an orchestration layer
Full-stack monitoring of applications with Dynatrace is only possible if you deploy full-stack OneAgents on important hosts in your environment, specifically those that host your applications’ services and resources.
When OneAgent is deployed on virtual machines operated by OpenStack, you can take advantage of the powerful Dynatrace APM value proposition: zero-configuration detection of applications, services, problems, and root cause analysis. Over and above that, we identify OpenStack as acting as an cloud technology and provide information about OpenStack’s compute node.
Smartscape analysis shows you how your VMs interact with each other and gives you an understanding of the vertical dependencies between your application components—virtual machines, processes, and services.
Step 2: OneAgent on OpenStack compute nodes—awareness of services and resource utilization of OpenStack services on VMs
If needed, OneAgents can also be deployed on OpenStack compute nodes. In such cases, we recommend that OneAgents be configured for cloud infrastructure-only monitoring mode. This is dictated by the fact that there are typically no injectable technologies to monitor, and it helps reduce the cost of host units consumed by OneAgents.
When deployed on compute nodes, OneAgents provide valuable insight into the existence and resource allocation of VMs managed by OpenStack, as well as their availability, responsiveness, associated worker processes, I/O operations, and more.
Additionally, all OpenStack services running on the compute node are properly discovered and measured for availability and resource consumption.
Step 3: OneAgent on OpenStack controller nodes—awareness of services and their resource utilization for important OpenStack services
When OneAgents are deployed on OpenStack controller nodes, it’s possible to detect and monitor the remaining OpenStack services—those that are not typically found on compute nodes but are important elements of OpenStack.
Dynatrace provides out-of-the-box alerting on resource allocation and availability for these processes.
Step 4: Deep insight into OpenStack via plugins
Under the hood of OpenStack, there are several popular technologies that we can also monitor with Dynatrace OneAgents through the use of their respective plugins. These technologies include RabbitMQ, MySQL, HAproxy, and MemCached. The plugins require additional configuration (namely, access to these services’ APIs), but in return, provide technology-specific measurements.
To illustrate the challenges involved in monitoring the technologies that support OpenStack, here’s a problem we ran into within our own OpenStack environment. The RabbitMQ process in the example below was launched using the default file descriptor limit of 1024. Once this limit was exceeded, RabbitMQ stopped accepting new connections. This resulted in a Connectivity problem.
We wouldn’t have known about this problem if it weren’t for the RabbitMQ-specific measurements that Dynatrace provides. All details are included in the same view, so there is no need to use multiple tools to get the complete picture.
Step 5: Log Analytics
Dynatrace comes with a powerful Log Analytics module that can be applied to monitor OpenStack services. When configured, it picks up symptoms of problems specific to OpenStack and takes them into account while performing the root-cause analysis of the solution.
In the example below the Log viewer has uncovered numerous warnings in the keystone.log
file indicating that the authentication process has been failing.
In this particular case, the root cause of these problems was related to memory saturation on the controller node. As illustrated below, the memory was indeed exhausted: it had reached almost 100% saturation.
Note further down in the Processes section that all OpenStack services running on the controller are listed. You can click any of these individual processes to analyze their connections and understand their relationship to other processes.
The Log Analytics module is fully configurable. Below are a dozen example configurations that can be easily changed and adapted to your local OpenStack environment. They were tested to work with older versions of OpenStack, so some updates might be required for more recent releases.
For Glance service (log path /var/log/glance/glance-api.log
):
- Glance registry can’t connect to SQL database because connection pool is empty
search pattern:ERROR AND "OperationalError:" AND "pymysql.err.OperationalError" AND "Too many connections"
threshold: 0.0 - Glance registry can’t retrieve list of images
search pattern:ERROR AND "glance.registry.api.v1.images" AND "Unable to get images"
threshold: 0.0 - Glance API returned an error while using Glance registry
search pattern:ERROR AND "glance.common.wsgi ServerError" AND "The request returned 500 Internal Server Error"
threshold: 0.0 - Glance API authorization issue: Unable to validate token
search pattern:CRITICAL AND "Unable to validate token"
threshold: 0.0 - Glance API authorization-configuration issue
search pattern:DiscoveryFailure AND "Could not determine a suitable URL for the plugin"
threshold: 0.0 - Glance API can’t connect to SQL database
search pattern:ERROR AND DBConnectionError
threshold: 0.0
For Neutron service:
- Neutron agent can’t connect to SQL database
search pattern:ERROR AND "neutron.agent.dhcp.agent" AND "DBConnectionError" AND "Can't connect to MySQL"
threshold: 0.0
log paths:/var/log/neutron/dhcp-agent.log
- Neutron can’t connect to SQL server
search pattern:ERROR AND "OperationalError:" AND "Too many connections"
threshold: 0.0
log paths:/var/log/neutron/metadata-agent.log, /var/log/neutron/neutron-server.log, /var/log/neutron/openvswitch-agent.log, /var/log/neutron/neutron-ns-metadata-proxy-#.log, /var/log/neutron/l3-agent.log, /var/log/neutron/dhcp-agent.log, FIXED
- Neutron: l3 agent configuration issue
search pattern:ERROR AND "neutron.agent.l3.agent" AND "An interface driver must be specified"
threshold: 0.0
log paths:/var/log/neutron/l3-agent.log
- Neutron server is overloaded and unable to respond quickly: Timeout in RPC method get_service_plugin_list
search pattern:ERROR AND "neutron.common.rpc" AND "Timeout in RPC method get_service_plugin_list"
threshold: 0.0
log paths:/var/log/neutron/l3-agent.log
For Keystone service:
- Keystone can’t connect to SQL database.
search pattern:DBConnectionError
threshold: 0.0
log paths:/var/log/keystone/keystone-wsgi-admin.log, /var/log/keystone/keystone-manage.log, /var/log/keystone/keystone.log, /var/log/keystone/keystone-wsgi-public.log, /var/log/apache2/error.log, /var/log/apache2/keystone_access.log, /var/log/apache2/keystone.log
- Keystone: Apache WSGI configuration is broken
search pattern:"Target WSGI script not found" AND keystone-wsgi
threshold: 0.0
log paths:/var/log/apache2/keystone.log
Potential improvements and further steps
When we defined the original scope of the OpenStack monitoring EAP, we developed a number of specific plugins for OpenStack services. The goal of these plugins was to provide additional insight into specific metrics for Keystone, Horizon, and Glance. They are currently not part of the out-of-the-box solution, but can be retrofitted and included in OneAgents with some effort related to the exposure of their respective configurations.
The data provided by these plugins can be also analyzed by Dynatrace AI and taken into account during root-cause analysis. It can also be subject to alerting and integrations with external services.
We want to hear from you
We’re always happy to receive your feedback and ideas. Reach out to us via Dynatrace Community, Dynatrace Support, or your Dynatrace representative to share your thoughts with us. Please let us know how you are using OpenStack infrastructure monitoring by Dynatrace in your environment and how we can improve it to make it even better..
Looking for answers?
Start a new discussion or ask for help in our Q&A forum.
Go to forum