I spent a bit of time on Reddit the other day and thought it interesting just how many posts were focused on IT on-call and on-call scheduling. Some posts were rants on horrible customers – who hasn’t had some of those? Some actually wrote about positive interactions from being on-call – those were rare posts. But many engineers in DevOps and IT posted on their trepidation about being on-call. They wondered:
The answers to these questions though don’t need to cause trepidation. While after-hour assignments can be anxiety producing, having the right tools and management go a long way toward helping to create reasonable expectations and outcomes.
If I were to ask you about why on-call is necessary you might think me a bit of a dunce – go ahead, I’ve been called worse. Isn’t it obvious that it’s needed to answer customer questions about the product? Duh?!
But truth is that answering customer product questions is not the only reason on-call exists. In the realm of product development, it’s a necessary pursuit. You cannot develop product effectively if the product is disconnected from testing its resilience. And you cannot know the product’s resilience unless you put it in front of your customers and allow them to test it. And let customers call you when it breaks.
Additionally, after-hour rotations allow Dev, Ops and all of your IT team to see how well the product or set up they have created is working. Many I have spoken to in the DevOps world call this ‘eating your own dogfood.’ Yuck. This statement is meant to illustrate that no one in the IT family can simply create their perceived technical masterpiece and walk away. Instead, they need to take responsibility for their creation. Being part of the after-hours family helps ensure this level of responsibility.
In addition to being on-call, there are many additional issues with alerting. Often, issues come in after hours and they lack context. These sorts of problems come in many flavors. For example:
A much better idea is to create an actual schedule with a dedicated tool designed to handle effective alerting, auditing and messaging. A tool like OnPage can answer these on-call issues as well as many of the trepidations which engineers face about being on-call.
Effective management of after-hour assignments need to be premeditated. That is, the process needs to be thought through and cannot be ad hoc. While most DevOps teams and IT teams have a schedule, they haven’t thought through the whole process. Instead, teams should create on-call schedules that:
While IT on-call might cause trepidation initially, the time spent planning will definitely pay dividends. Again, use a scheduling tool that will allow your team to work effectively together and more like a, well…, team.
OnPage is an excellent tool for managing and improving life during after hours. Learn how OnPage can help you and your team better manage alerting. Schedule a demo with OnPage today.
Gartner’s Magic Quadrant for CC&C recognized OnPage for its practical, purpose-built solutions that streamline critical…
Site Reliability Engineer’s Guide to Black Friday It’s gotten to the point where Black Friday…
Cloud engineers have become a vital part of many organizations – orchestrating cloud services to…
Organizations across the globe are seeing rapid growth in the technologies they use every day.…
How Effective Are Your Alerting Rules? Recently, I came across this Reddit post highlighting the…
What Are Large Language Models? Large language models are algorithms designed to understand, generate, and…