DevOps is constantly trying to improve production through automation, collaboration and tools. ChatOps is often the paradigm which brings these tasks together into a single conversation. In ChatOps, “chat applications and tools for real-time communication and task execution [are distributed] among members of development and IT operations teams”. Yet often times the proponents of ChatOps don’t pay sufficient attention to the incident management component part of the operation, preferring instead to look at the bots and chat room tools.
However, as James Fryman noted in his talk on ChatOps: Technology and Philosophy at Geekdom San Francisco “the shared context [of chat rooms] allows everybody to see and collaborate around things that happen. This is super amazing [for] the incident management space.” Specifically, though, when a high priority or critical alerts occurs, notifications need to be used to broadcast the incident beyond the chat room and ensure the conversations don’t get muddled.
The notion of chat rooms did not begin with GitHub. Rather, IT has used BBSs (Bulletin Board Systems) and later IRC (Internet Relay Chat) to encourage connected networks through chat. And even today with Slack, Spark or Hipchat, the goal is still the same as it was with these previous forums. As New Relic notes in a blog article “even as the tools facilitating real-time chat have changed, the primary reasons for using them have not.” The goal has remained to enable synchronous and asynchronous communication for distributed groups and people. Chat allows for greater collaboration on development, less delay and better outcomes. Tomer Levy, CEO of Logz.io, notes that the strength of ChatOps lies in the “feedback loop [which] enhances collaboration”
As one engineer I met who works for Wayfair noted he was challenged and fatigued by calls with the company’s developers in China. “At 2am my heavy Jersey accent and their heavy Chinese accent made it difficult to understand one another. Putting our conversation in chat channels really cut down on the frustration.”
Now that we are agreed on the importance of ChatOps, I need to rock the boat a bit and make a distinction here between normal chat and priority alerts in chat channels. As discussed above, chat is great for furthering collaboration. However, when important or critical situations happen, a different approach needs to be taken.
When a low priority alert occurs, an engineer needs to be notified. A simple chat request is not sufficient. OnPage allows you to actually create an audible alert on an engineer’s smartphone through a simple Slack command. See OnPage’s video on Slack integration for further explanation on how this works. Chat can then continue in OnPage for low priority alerts or through Slack channels. OnPage is bi-directional so conversations can go from Slack to OnPage and vice-versa. The issue that remains relevant is how to best conduct critical or high priority alerts and elevate them to their own channel.
The need for separate channels for critical alerts is highlighted by the fact that traditional Slack or chat channels can quickly get muddled if they also become the place for critical alert discussion. The meandering and humorous tone of chat channels is not conducive to high priority alerts.
In critical alerting situations, the conversation needs to be focused and directed. Furthermore, the need to actually alert through an audible notification can be achieved through a command to OnPage where critical alerts can occur on the engineer’s smart phone. These can be louder or simply different from a low-priority alert. The conversation should then continue in a new high priority chat channel that is separated from the existing conversation.
There are several reasons for separating the high-priority conversation into another channel. These include:
The critical alert is enabled by OnPage and the conversation is elevated to a dedicated Slack channel. By enabling a separate, dedicated channel for critical alert situations many advantages will follow:
For ChatOps to truly improve DevOps, critical alerting has to be nuanced. Critical alerts cannot stay in the thread of regular chat. Chat needs to be differentiated based on whether the conversations are for critical issues or low-priority or regular conversations. For both situations, OnPage provides a solution that ensures engineers receive the right message at the right time. Every time.
OnPage is cloud-based incident alerting and management platform that elevates notifications on your smartphone so they continue to alert until read. Incidents can be programmed to arrive to the person on-call and can be escalated if they are not attended to promptly. Schedule a demonstration today!
Gartner’s Magic Quadrant for CC&C recognized OnPage for its practical, purpose-built solutions that streamline critical…
Site Reliability Engineer’s Guide to Black Friday It’s gotten to the point where Black Friday…
Cloud engineers have become a vital part of many organizations – orchestrating cloud services to…
Organizations across the globe are seeing rapid growth in the technologies they use every day.…
How Effective Are Your Alerting Rules? Recently, I came across this Reddit post highlighting the…
What Are Large Language Models? Large language models are algorithms designed to understand, generate, and…