Extend the platform,
empower your team.
Monitor your Postgres performance via our new EF2.0 extension framework.
ExtensionPostgresSQL monitoring solution is based on a remote monitoring approach implemented as a Dynatrace ActiveGate extension. The extension queries Postgres databases for key performance and health. Dynatrace's DAVIS AI then analyzes these metrics to provide anomaly and problem analysis.
Steps to activate remote monitoring (version 1.255+):
Create a database user using the following template:
CREATE USER dynatrace WITH PASSWORD '<PASSWORD>' INHERIT;
GRANT pg_monitor TO dynatrace;
There must be connectivity between the ActiveGate, where the extension is deployed, and the Postgres database.
For top query monitoring to work, the pg_stat_statements
table needs to be available. This has to be done at the database level as per the official docs. Ensure also that you have log monitoring enabled.
If both Dynatrace log monitoring is enabled and the pg_stat_statements
table is available, Dynatrace will ingest the top 100 queries (sorted by total execution time) every 5 minutes and store them as logs. These logs are available either from the database instance screen or on the Databases App, under "Top queries by total execution time". To filter by these queries on a dashboard or notebook, one can filter by dt.extension.name = com.dynatrace.extension.postgres
and event.group = top_queries
. See below a DQL query example:
fetch logs
| filter dt.extension.name=="com.dynatrace.extension.postgres" and event.group=="top_queries"
| sort total_exec_time desc
Note also that the planning time information for these queries requires pg_stat_statements.track_planning
to be enabled.
Regardless of whether pg_stat_statements
is available or not, Dynatrace will still collected queries from pg_stat_activity
as part of the Queries
feature set, which are similarly ingested as logs with event.group = longest_queries
.
For SaaS users who have access to the Databases app and who have top query monitoring enabled (see previous section), fetching execution plans for these queries is now possible. This can be done from the Databases app, under Statement performance
, by clicking Request
on the execution plan for a specific query.
For that query
, the extension will then attemp to execute the following:
SELECT * from dynatrace.dynatrace_execution_plan({query})
and then ingest into Dynatrace the first row of the column named explain
. These execution plans are ingested as logs with event.group = execution_plans
.
For security reasons which prevent the extension from directly executing anything other than a SELECT
command, the dynatrace_execution_plan
function has to be defined beforehand at the database level as follows:
CREATE OR REPLACE FUNCTION dynatrace.dynatrace_execution_plan(
query text,
OUT explain JSON
) RETURNS SETOF JSON
LANGUAGE plpgsql
VOLATILE
RETURNS NULL ON NULL INPUT
SECURITY DEFINER
ROWS 1
SET plan_cache_mode = force_generic_plan
AS
$$DECLARE
arg_count integer;
open_paren text;
close_paren text;
explain_cmd text;
json_result json;
BEGIN
/* reject statements containing a semicolon in the middle */
IF pg_catalog.strpos(
pg_catalog.rtrim(dynatrace_execution_plan.query, ';'),
';'
) OPERATOR(pg_catalog.>) 0 THEN
RAISE EXCEPTION 'query string must not contain a semicolon';
END IF;
/* get the parameter count */
SELECT count(*) INTO arg_count
FROM pg_catalog.regexp_matches( /* extract the "$n" */
pg_catalog.regexp_replace( /* remove single quoted strings */
dynatrace_execution_plan.query,
'''[^'']*''',
'',
'g'
),
'\$\d{1,}',
'g'
);
IF arg_count OPERATOR(pg_catalog.=) 0 THEN
open_paren := '';
close_paren := '';
ELSE
open_paren := '(';
close_paren := ')';
END IF;
/* construct a prepared statement */
EXECUTE
pg_catalog.concat(
'PREPARE _stmt_',
open_paren,
pg_catalog.rtrim(
pg_catalog.repeat('unknown,', arg_count),
','
),
close_paren,
' AS ',
dynatrace_execution_plan.query
);
/* construct an EXPLAIN statement */
explain_cmd :=
pg_catalog.concat(
'EXPLAIN (FORMAT JSON, ANALYZE FALSE) EXECUTE _stmt_',
open_paren,
pg_catalog.rtrim(
pg_catalog.repeat('NULL,', arg_count),
','
),
close_paren
);
/* get and return the plan */
EXECUTE explain_cmd INTO json_result;
RETURN QUERY SELECT json_result;
/* delete the prepared statement */
DEALLOCATE _stmt_;
END;$$;
Notice that the function above is defined with SECURITY DEFINER
, meaning it executes with the privileges of the user who defined it, not the one executing it. This is due to the fact that the permissions required to EXPLAIN
a query are the same as the one required to run that query, meaning that the user with which this function is defined needs to have a sufficient priveliges to run PREPARE
and EXPLAIN
on the queries it will be explaining. As such, for full functionality ensure that the function is defined with the appropriate user.
As with any function of this type, we recommend thorough testing before rolling out across big environments.
If after a couple of minutes the execution plans are not showing up when you press to fetch the plan, one of two things might be happening: either the EXPLAIN
query is failing or the ingestion into Dynatrace is causing trouble.
First, to check whether the logs are being brought into Dynatrace at all or if the problem is related to the Databases app open a Notebook and query:
fetch logs
| filter event.group=="execution_plans" and dt.extension.name=="com.dynatrace.extension.postgres" and sql_id=="<Statement ID>"
| sort timestamp desc
If there are logs showing up for the desired time frame this means that execution plan fetching is happening as normal but the Databases app is not picking them up. In this case, please contact Dynatrace support.
If the logs have content Couldn't obtain an execution plan. No execution plan was found.
, the next step would be to check the ActiveGate datasource logs from a support archive at <support_archive>/remoteplugin/log/extensions/datasources/com.dynatrace.extension.postgres/dynatracesourcesql.jar.log
for more details around the error. In tandem with this, it can be helpful to log into the database with the same user which is configured in the Dynatrace extension monitoring configuration and attempt to run
SELECT * from dynatrace.dynatrace_execution_plan({query})
for the problematic query
.
If the error relates to permissions, please ensure that the function was defined with the right user (see above). Note also that some queries cannot be explained (for example SET
or EXPLAIN
queries) so failure for those is expected.
If you believe that the EXPLAIN
should have succeeded, please contact Dynatrace support.
The extension is designed to work with PostgreSQL version 11 and later.
The metrics collected through this extension consume Dynatrace Davis Data Units (see DDUs for metrics).
A rough estimation of the amount of DDUs consumed by metric ingest can be obtained through the following formula:
( (11 * number of instances)
+ (29 * number of databases)
+ (1 * number of tablespaces)
) * 525.6 DDUs/year
For logs, regular DDU consumption for log monitoring applies. Depending on your licensing model, refer either to DDU consumption for Log Management and Analytics or DDUs for Log Monitoring Classic.
If your license consists of Custom Metrics. Each custom metric is equivalent to 525.6 DDUs/yr Please see documentation Metric Cost Calculation
Below is a complete list of the feature sets provided in this version. To ensure a good fit for your needs, individual feature sets can be activated and deactivated by your administrator during configuration.
Metric name | Metric key | Description | Unit |
---|---|---|---|
Instance uptime | postgres.uptime | Time since the instance has been started | Second |
Metric name | Metric key | Description | Unit |
---|---|---|---|
Tablespace size | postgres.tablespace.size | Tablespace size in bytes | Byte |
Metric name | Metric key | Description | Unit |
---|---|---|---|
WAL diff size | postgres.wal_diff_size | Size of difference between current WAL and last WAL replay | Byte |
WAL records per minute | postgres.wal_records.count | Number of WAL records generated per minute | Count |
WAL fpi per minute | postgres.wal_fpi.count | Number of WAL full page images generated per minute | Count |
WAL bytes | postgres.wal_bytes.count | Total amount of WAL generated in bytes | Count |
WAL buffers full | postgres.wal_buffers_full.count | Number of times WAL data was written to disk because WAL buffers became full | Count |
WAL write | postgres.wal_write.count | Number of times WAL buffers were written out to disk via XLogWrite request | Count |
WAL sync | postgres.wal_sync.count | Number of times WAL files were synced to disk via issue_xlog_fsync request | Count |
WAL write time | postgres.wal_write_time.count | Total amount of time spent writing WAL buffers to disk via XLogWrite request, in milliseconds | MilliSecond |
WAL sync time | postgres.wal_sync_time.count | Total amount of time spent syncing WAL files to disk via issue_xlog_fsync request, in milliseconds | MilliSecond |
Metric name | Metric key | Description | Unit |
---|---|---|---|
Active backend processes | postgres.activity.active | Number of server processes executing a query | Count |
Idle backend processes | postgres.activity.idle | Number of server processes waiting for a new client command | Count |
Idle in transaction backends processes | postgres.activity.idle_in_transaction | Number of server processes in transaction not currently executing a query | Count |
Idle in transaction aborted backends processes | postgres.activity.idle_in_transaction_aborted | Number of server processes in transaction not currently executing a query where one of the statements caused an error | Count |
Fast-path function backend processes | postgres.activity.fastpath_function_call | Number of server processes executing a fast-path function call | Count |
Metric name | Metric key | Description | Unit |
---|---|---|---|
Instance recovery mode | postgres.recovery.state | Indicate that the instance is in recovery mode. 1 if in recovery, 0 otherwise. | State |
Metric name | Metric key | Description | Unit |
---|---|---|---|
Number of backends | postgres.numbackends | Number of backends currently connected to this database | Count |
Committed transactions | postgres.xact_commit.count | Number of transactions in this database that have been committed | Count |
Rolled back transactions | postgres.xact_rollback.count | Number of transactions in this database that have been rolled back | Count |
Block read from disk | postgres.blks_read.count | Number of disk blocks read in this database | Count |
Blocks found in buffer cache | postgres.blks_hit.count | Number of times disk blocks were found already in the buffer cache, so that a read was not necessary | Count |
Live rows returned | postgres.tup_returned.count | Number of live rows fetched by sequential scans and index entries returned by index scans in this database | Count |
Live rows fetched by index scans | postgres.tup_fetched.count | Number of live rows fetched by index scans in this database | Count |
Rows inserted | postgres.tup_inserted.count | Number of rows inserted by queries in this database | Count |
Rows updated | postgres.tup_updated.count | Number of rows updated by queries in this database | Count |
Rows deleted | postgres.tup_deleted.count | Number of rows deleted by queries in this database | Count |
Queries canceled due to conflict | postgres.conflicts.count | Number of queries canceled due to conflicts with recovery in this database | Count |
Temporary files created | postgres.temp_files.count | Number of temporary files created by queries in this database | Count |
Data written to temporary files | postgres.temp_bytes.count | Total amount of data written to temporary files by queries in this database | Byte |
Deadlocks | postgres.deadlocks.count | Number of deadlocks detected in this database | Count |
Data file blocks reading time | postgres.blk_read_time.count | Time spent reading data file blocks by backends in this database | MilliSecond |
Data file blocks writing time | postgres.blk_write_time.count | Time spent writing data file blocks by backends in this database | MilliSecond |
Data page checksum failures | postgres.checksum_failures.count | Number of data page checksum failures detected in this database | Count |
Time spent by sessions | postgres.session_time.count | Time spent by database sessions in this database | MilliSecond |
Time spent executing SQL statements | postgres.active_time.count | Time spent executing SQL statements in this database | MilliSecond |
Time spent idling | postgres.idle_in_transaction_time.count | Time spent idling while in a transaction in this database | MilliSecond |
Established sessions | postgres.sessions.count | Total number of sessions established | Count |
Abandoned sessions | postgres.sessions_abandoned.count | Number of database sessions to this database that were terminated because connection to the client was lost | Count |
Fatal error terminated sessions | postgres.sessions_fatal.count | Number of database sessions to this database that were terminated by fatal errors | Count |
Killed sessions | postgres.sessions_killed.count | Number of database sessions to this database that were terminated by operator intervention | Count |
Metric name | Metric key | Description | Unit |
---|---|---|---|
Scheduled checkpoints performed | postgres.checkpoints_timed.count | Number of scheduled checkpoints that have been performed | Count |
Requested checkpoints performed | postgres.checkpoints_req.count | Number of requested checkpoints that have been performed | Count |
Checkpoints write time | postgres.checkpoint_write_time.count | Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk | MilliSecond |
Checkpoint sync time | postgres.checkpoint_sync_time.count | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk | MilliSecond |
Buffers written during checkpoints | postgres.buffers_checkpoint.count | Number of buffers written during checkpoints | Count |
Buffers written by background writer | postgres.buffers_clean.count | Number of buffers written by the background writer | Count |
Cleaning scan stops | postgres.maxwritten_clean.count | Number of times the background writer stopped a cleaning scan because it had written too many buffers | Count |
Buffers written by backend | postgres.buffers_backend.count | Number of buffers written directly by a backend | Count |
Backend fsync executions | postgres.buffers_backend_fsync.count | Number of times a backend had to execute its own fsync call | Count |
Buffers allocated | postgres.buffers_alloc.count | Number of buffers allocated | Count |
Bugfixes:
ℹ️ This version requires ActiveGate version 1.293.0 or higher.
Improvements
ALTER
statements for security reasons.Bugfixes:
ℹ️ This version requires ActiveGate version 1.293.0 or higher.
WARNING This extension requires minimum ActiveGate version 1.293. Please check your version before upgrading.
Bugfixes:
pg_stat_statements
for PostgreSQL 13.WARNING This extension requires minimum ActiveGate version 1.293. Please check your version before upgrading.
Features:
Getting started
section on the hub tile for more details on how to set it up.Bug fixes:
Query duration as collected by longest_queries
is now calculated as clock_timestamp() - query_start
. Previously it was computed as now() - query_start
which sometimes caused the query duration to be negative.
The reason behind this is that now()
or CURRENT_TIMESTAMP
represent the start of the transaction and an individual query within the transaction can start later. The clock_timestamp()
however returns an actual system cpu clock and can change over the course of the transaction, producing a correct value to subtract query_start
from to compute a valid duration.
Bug fixes:
postgres.blk_hit.count
metric key to postgres.blks_hit.count
to match official Postgres statistics collector name.Features:
endpoint_metadata
attribute of sql:postgres_instance
entities and can be used in entity selectors (to facilitate tagging, for example).Queries
feature set will now collect queries (ingested as logs) from the pg_stat_statements
table as well. These new logs can be queried by filtering by event_group = top_queries
. Note that for this to work, the pg_stat_statements
table needs to be enabled at the database level as per the official docs.Improvements:
pg_stat_activity
is now collected for all activities, not just the ones associated with a specific datid
. That adds monitoring for activites such as autovacuum worker
.Improvements:
Bug fixes:
client_port
dimension was removed from the following metrics, because the field bring no valuable information and is too volatile, causing too many distinct timeseries to be generated.
postgres.activity.active
postgres.activity.fastpath_function_call
postgres.activity.idle_in_transaction_aborted
postgres.activity.idle_in_transaction
postgres.activity.idle
Breaking changes: Existing monitoring configurations must be recreated through API or manually
New features:
Bug fixes:
Fix for schema change
No release notes