incident management

Latest Developments in Site Reliability Engineering, 2023

Introduction

Gartner recently published its Hype Cycle for Site Reliability Engineering, 2023, (July 2023) report. OnPage was inspired by this report to share its prediction about the future of site reliability engineering. In this blog, OnPage will review evolutionary tools that can improve site reliability engineering practices.

What is Site Reliability Engineering?

Site reliability engineering (SRE) entails ensuring smooth, strong, and reliable infrastructure and operations (I&O). In I&O optimization, site reliability engineers apply software engineering principles in order to maintain scalable high-performing, reliable systems. Investing in SRE can lead to longer periods of consistent network uptime, stronger cybersecurity, higher levels of customer satisfaction, increased observability and monitoring, and quicker software deployment.

Trends in Site Reliability Engineering

While creating its report, Gartner® identified strategic planning assumptions. OnPage agrees with the following predictions made by Gartner®:

  • By 2027, 75% of enterprises will use site reliability engineering practices across their organizations to optimize product design, cost and operations to meet customer expectations, up from 10% in 2022.
  • By 2025, 40% of organizations will implement chaos engineering practices as part of site reliability engineering initiatives, improving mean time to repair (MTTR) by an average of 90%.
  • By 2026, 70% of organizations that successfully applied observability will achieve shorter latency for decision making, enabling competitive advantage for target business or IT processes.
  • By 2025, organizations that invest in building digital immunity will increase customer satisfaction by decreasing downtime by 80%.
  • By the end of 2025, 30% of enterprises will establish new roles focused on IT resilience and boost end-to-end reliability, tolerability and recoverability by at least 45%.”

(Gartner® 1, 2023)

Tools & Practices for Site Reliability Engineering

Focus

In this section, we will summarize SRE techniques and automated tools and explain its impact on site reliability engineering. The first practice—monitoring as code (MaC)—is emerging in the market and developers are swiftly innovating the first-generation software. The second is a tool—automated incident response (AIR)—approaching mainstream commercialization and vendors are committed to better understanding the software’s capabilities so they can elevate themselves to mainstream adoption. Thirdly, DevSecOps (development, operations, security) has been fully accepted and organizations bask in its low-risk, easy implementation as mainstream adoption rapidly increases.

Monitoring as Code

OnPage believes MaC is emerging in relevance in the industry. Here is the Gartner® definition of MaC:

“Monitoring as code (MaC) is the process of applying software principles to monitoring,

meaning the configuration of monitoring is designed to enable its management, like

software. With MaC, the configuration of monitoring is codified, version-controlled, tested

and automated. This flexibility offers DevOps teams the option to apply a shift-left

approach for fast and consistent monitoring across systems.” (Gartner® 31, 2023)

 

Before MaC, traditional monitoring required unrealistic amounts of manual and/or inflexible configuration that increased the risk of human error and extended the mean time to detect failures. Now, MaC improves monitoring practices and can be a proper SRE monitoring solution that supports engineers’ responsibility to upkeep I&O reliability.

OnPage advises organizations to use MaC to better customize necessary monitoring practices to their DevOps and site reliability engineering needs. Additionally, seek operational feedback so they can better tailor MaC to the organization’s I&O network.

Remember that MaC is still in the early stages of innovation. If your organization is unhappy with its current SRE methodologies for monitoring key performance indicators (KPI) and is willing to experiment, consider MaC and investing in tools to support these new monitoring endeavors.

Automated Incident Response

AIR centralizes alert management and incident routing, enabling organizations to streamline IT operations, reduce incident response time, and enhance incident resolution. One example of AIR is OnPage’s incident alert management platform which has on-call scheduling capabilities, escalation policies, and notifications that persistently ping until addressed by an on-call specialist. The OnPage team is pleased to inform that we’ve been included in the Gartner® Hype Cycle™ for Site Reliability Engineering, 2023 report, listing OnPage as a Sample Vendor in the Automated Incident Response category. We believe it is an honor to be mentioned in the Gartner® Hype Cycle™ for Site Reliability Engineering, 2023 and are proud to continue serving the IT community.

OnPage believes organizations can decrease mean time to acknowledge (MTTA) and expedite disaster recovery by providing DevOps and site reliability engineers with a centralized AIR solution. Many AIR tools can integrate with existing software in DevOps toolchains and ChatOps tools. For example, OnPage’s integration with Slack can simplify communication and collaboration across teams with automated workflows during SRE incident response cases.

With AIR expected to be standard in site reliability engineering practices, OnPage recommends investing in a centralized AIR solution. For more information about automated incident response tools, schedule a FREE 30-minute demo with a team member to learn about OnPage’s cutting-edge incident alert management platform.

DevSecOps

OnPage believes the future of SRE lies in part with DevSecOps. Gartner® defines DevSecOps as:

“…the integration and automation of security and compliance testing into

agile IT and DevOps development pipelines, as seamlessly and transparently as possible,

without reducing the agility or speed of developers or requiring them to leave their

development toolchain. Ideally, offerings provide security visibility and protection at

runtime as well.” (Gartner® 86, 2023)

 

Within the typical software development cycle (SDLC), SRE methodologies for identifying cybersecurity threats can frustrate developers and slow down the SDLC. DevSecOps supports SRE’s goal to ensure software reliability by identifying vulnerabilities in development and can support cybersecurity measures without sabotaging developers already riddled with harsh deadlines and managerial pressure.

OnPage suggests making security testing available early in the SDLC so developers can catch and fix mistakes early without deviating from their CI/CD pipelines. Integrating security testing with the SDLC and the developer’s existing workflows and toolsets can help maintain the expected rapid pace of development.

If your organization does not practice DevSecOps, OnPage recommends reevaluating your I&O to see if DevSecOps can contribute to SDLC and SRE optimization.

Conclusion

OnPage believes that MaC, AIR, and DevSecOps have the ability to support site reliability engineering duties. Organizations can optimize SRE and I&O by observing the evolution of MaC, AIR, and DevSecOps as well as adopting the right tools and technologies at the right time.

 

________________________________________

* Disclaimer
GARTNER® is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and HYPE CYCLE is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved. Gartner® does not endorse any vendor, product or service depicted in our research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner® research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner® disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

FAQs

What is the Gartner® Hype Cycle™?
The Gartner® Hype Cycle™ is a graph, published by Gartner, that represents the various phases of emerging and mature technologies across different industries.
What are the five phases of the Gartner® Hype Cycle™?
The five phases of the Gartner® Hype Cycle™ are Innovation Trigger, Peak of Inflated Expectations, Trough of Disillusionment, Slope of Enlightenment, and Plateau of Productivity.
How does a technology get included in the Gartner® Hype Cycle™?
Gartner® analysts conduct extensive research and analysis to determine which technologies to include along with where they will fall on the Hype Cycle™.
Halle Katz

Share
Published by
Halle Katz

Recent Posts

Site Reliability Engineer’s Guide to Black Friday

Site Reliability Engineer’s Guide to Black Friday   It’s gotten to the point where Black Friday…

5 days ago

Cloud Engineer – Roles and Responsibilities

Cloud engineers have become a vital part of many organizations – orchestrating cloud services to…

4 weeks ago

The Vitals Signs: Why Managed IT Services for Healthcare?

Organizations across the globe are seeing rapid growth in the technologies they use every day.…

1 month ago

How Effective are Your Alerting Rules?

How Effective Are Your Alerting Rules? Recently, I came across this Reddit post highlighting the…

1 month ago

Using LLMs for Automated IT Incident Management

What Are Large Language Models?  Large language models are algorithms designed to understand, generate, and…

2 months ago

OnPage Lands Spot on Constellation ShortList™ for Clinical Communication in 2024

Recognition highlights OnPage's commitment to advancing healthcare communication through new integrations and platform upgrades. Waltham,…

3 months ago