While site reliability and engineering teams are key participants in the process of digital transformation, teams still need to mature to be effective.
Site reliability engineering seeks to bridge the gap between developers and operations teams, embedding reliability and resiliency into each stage of the software development lifecycle.
Site reliability engineering (SRE) is a key component of digital transformation. Still, many organizations have found that the transition to SRE maturity is not always easy. The 2022 State of SRE Report surveyed 450 SREs across a variety of organizations about how they view SRE today and where they see it evolving as a discipline.
In a recent webinar, Saif Gunja – director of DevOps product marketing at Dynatrace – sat down with three SRE panelists to discuss the standout findings and where they see the future of SRE.
Key finding #1: SRE is maturing, but not fast enough
The journey to SRE maturity varies among organizations. According to panelist Stephen Townshend, SRE at IAG, engineers with the SRE title still struggle to understand reliability and achieve agility. “The concepts behind SRE have value and are permeating across the industry,” he said. But despite the title change, many SREs remain stuck in the old way of doing things. “We’re not yet at the stage, I think, where we’re executing [SRE methodologies].”
Indeed, according to the report, 67% of SREs report dedicating the most time per week to reducing mean time to repair. Additionally, 60% report spending much of their time building and maintaining automation code.
Achieving SRE maturity is difficult when much of the typical SRE’s efforts are reactive. While creating automation scripts might be an effective short-term solution, it requires long-term maintenance and code updates, which become more complicated as environments become more complex.
How can organizations accelerate their journey to SRE maturity? Panelist Danne Aguiar, SRE at Kyndryl, said that cultural change is the one thing that organizations need to get to that next level of maturity. “SRE practices don’t apply to one single department,” he said. “It should be the entire company.”
Panelist Michael Cabrera, SRE at Vivint, agrees. “If we’re spending a ton of time in post-mortems, it’s because we’re not spending enough time ensuring availability of applications,” he said. “I think it takes more than just [hiring] an SRE. It’s really an organizational shift.”
Key finding #2: SLOs are becoming staples for SREs, but maximizing full potential is challenging
Service-level objectives (SLOs) are key to the SRE role; they are agreed-upon performance benchmarks that represent the health of an application or service.
SREs need SLOs to measure and monitor performance, but many organizations lack the automation and intelligence to streamline data. The report found that 64% of SREs reported too many data sources in their organizations, making data collection difficult. More than half (54%) of respondents reported that too many metrics made finding the relevant ones difficult.
Tool sprawl and siloed teams also present significant challenges, according to 68% of respondents. “Even in the observability space, we have so many different tools,” Townshend said in the webinar. “One of the challenges we’re trying to tackle right now [is] how we can pull that data into one place.”
Key finding #3: Efforts to reduce SRE toil are key to success
The first step to reducing SRE toil is to embrace the cultural shift. To get to that next level of maturity, organizations need to break down silos, prioritize reliability, and involve leadership for wider buy-in.
Choosing the right platform – one with automation and artificial intelligence at the core – is the next important step. Cabrera agrees: “Your tools matter a lot here,” he said. “I’m a Dynatrace customer, and Dynatrace helps quite a bit out of the box.”
Using precise AI-driven answers and intelligent automation, Dynatrace can help jumpstart your journey to SRE maturity. Dynatrace embeds AI-powered observability into every stage of the software development lifecycle, proactively identifying site reliability issues before they arise. A single source of truth on one pane of glass, Dynatrace breaks down organizational silos and eliminates the need for multiple tools, ensuring frictionless cross-team collaboration.
Want to learn more? Check out the webinar or download the 2022 State of SRE Report.
Looking for answers?
Start a new discussion or ask for help in our Q&A forum.
Go to forum