Streamlining site reliability at scale can be daunting, particularly with large-scale AWS environments and architecture that rely on hundreds—or even thousands—of Amazon EC2 instances. However, you can simplify the process by automating guardians in the Site Reliability Guardian (SRG) to trigger whenever there are AWS tag changes, helping teams improve compliance and effectively manage system performance.
This step-by-step guide will show you how to configure your architecture to trigger guardians whenever EC2 tags are updated. Note that EC2 is an example; this guide can be made to work generically for tag changes on any AWS resource. By the end of this guide, you’ll be ready to automate guardians at scale and optimize Amazon EC2 management with ease.
Why automate guardians for AWS tag changes?
Before diving into the technical setup, here’s why automating guardians whenever EC2 tags change is beneficial for your organization:
- Greater efficiency: Automatically triggering guardians removes the need for manual intervention, saving time for DevOps or site reliability engineering (SRE) teams and allowing for more efficient resource management at scale.
- Better compliance: Automating guardians ensures critical policies and checks are consistently applied after changes across your architecture, improving security and compliance efforts.
- Cost optimization: Immediate responses to tag changes lead to informed decisions about scaling, shutting down unused instances, or fine-tuning resource efficiency.
- Proactive site reliability: Automated guardians can monitor the four golden signals, enabling proactive reliability measures.
Now, let’s get started with the setup!
Step 1: Create an API token
First, create an API token to integrate AWS services with Dynatrace for guardian automation.
- Log into your Dynatrace tenant
Log in to your Dynatrace tenant and note the first part of the URL (for instance, “abc12345”), which is your tenant ID.
- Access token settings
Press Ctrl + K or CMD + K and search for “Access Tokens” within Dynatrace.
- Generate a new token
Create a new access token and assign it “bizevents.ingest” permissions.
- Save the token
Copy and securely store the token, which looks like “dt0c01.*****.*****”. You’ll use this later during configuration.
Step 2: Create the EventBridge connection
Amazon EventBridge acts as the bridge between AWS and Dynatrace. Here’s how to set it up:
- Navigate to Amazon EventBridge
Log in to your AWS Management Console and go to EventBridge > Connections.
- Recreate the cURL command
You can use this cURL command as a reference to establish your connection:
curl -X POST \
'https://abc12345.live.dynatrace.com/api/v2/bizevents/ingest' \
-H 'Authorization: Api-Token dt0c01.*****.*****' \
-H 'Content-Type: application/cloudevent+json' \
-d '{…}'
- Set the Authorization Method
Create a new EventBridge connection with the Authorization Method set to “API Key” and use the API token from Step 1 as the value (i.e., “Api-Token dt0c01.*****.****”).
Step 3: Define and configure EventBridge rules
EventBridge rules define the exact conditions for triggering guardians:
- Specify the input template:
Create an input template to modify your event data:
{
"specversion": "1.0",
"id": "<id>",
"source": "aws.<source>",
"type": "ec2.tag.change",
"time": "<time>",
"aws.region": "<region>",
"aws.eventbridge.rule.arn": "<aws.events.rule-arn>",
"aws.resources": <resources>,
"data": <detail>
}
- Set the event pattern
Create a rule in EventBridge with the following event pattern:
{
"source": ["aws.tag"],
"detail-type": ["Tag Change on Resource"],
"detail": {
"service": ["ec2"],
"resource-type": ["instance"]
}
}
- Apply targets and permissions
Assign targets and permissions to ensure successful data ingestion into Dynatrace. Use an IAM role to permit EventBridge to call your API destination.
The input transformer should be set as follows:
{"detail":"$.detail","id":"$.id","region":"$.region","resources":"$.resources","source":"$.source","time":"$.time"}
Step 4: Test tag changes on Amazon EC2 instances
To validate your configuration, perform the following:
- Change a tag
Modify a tag by going to your Amazon EC2 instances in the AWS Management Console. For instance, update the “Environment” tag with a new value.
- Verify event logging
Check the EventBridge console to ensure your tag change triggered the appropriate event.
- Confirm data in Dynatrace
Within Dynatrace, press CMD/Ctrl + K and search for “Notebooks.” Create a new notebook and run the following query:
fetch bizevents | filter event.type == "ec2.tag.change"
If the query returns results, your configuration is working correctly.
Step 5: Set up the guardian
- Create a new guardian
In Dynatrace, search for “Site Reliability Guardian” (`CMD/Ctrl + K`) and create a new guardian. For best practices, use the “Four Golden Signals” template.
- Automate the workflow
Either on the overview page showing all guardians or on the analysis page of a selected guardian, click the Automate button. This will generate a workflow that triggers the guardian based on incoming bizevents (Business events). Configure the event type as `bizevent` and set the filter query to:
event.type == "ec2.tag.change"
- Add a pause
To allow your systems to stabilize before triggering the guardian, add a “wait before” step. For example, set a delay of 600 seconds (10 minutes).
Example timeline:
- 06:59: Tag changed on EC2 instance.
- 07:00: EventBridge triggers the workflow.
- 07:10: Guardian is executed after 10-minute pause.
- Save the workflow
Save your final workflow to activate the automation.
Step 6: Validate and monitor the setup
Perform end-to-end validation by changing an EC2 tag again. Confirm the following:
- The tag change event reaches Dynatrace.
- The workflow triggers the guardian.
- The guardian results appear in Dynatrace (e.g., heatmaps or relevant logs).
Run the following query in Dynatrace for additional monitoring:
fetch bizevents | filter event.type == "ec2.tag.change"
You should see log entries confirming the successful execution of your guardian process.
Achieve more with Site Reliability Guardian
In this blog, we highlighted the significant benefits of automating Site Reliability Guardian triggers for Amazon EC2 changes. With automation, SRG helps engineering teams achieve efficiency, improved compliance, and cost optimization.
Looking for answers?
Start a new discussion or ask for help in our Q&A forum.
Go to forum