IT Alerting

On-Call Life: Setting Expectations

Imagine this: You’ve just been offered a new job in tech. Maybe it’s your first job right out of college, and you’ve only heard of being on-call in passing conversations up until this point. Or, perhaps you’ve been in tech your whole life but never had to be on-call until today. Or, maybe you’re contemplating whether on-call is for you because your company is dangling some extra cash (because, who doesn’t like extra money!). 

Now, you’re faced with a life-changing decision: do you want to be tied to this term for the foreseeable future at your company?

First Things First: Am I on-call material?

If you’re about to commit to a tech job where being on call is a de-facto requirement, ask yourself if on-call is truly for you. Everyone has different expectations from work, and what works for one person might not be ideal for another. 

While pay might be a major factor, some may prioritize work-life balance, family time, or pursuing other passions. It’s all about deciding what you prioritize. There’s no right or wrong answer, just own your decision. This mindset can shift your perspective from resenting a future situation to understanding that you signed up for it because, for instance, the pay was too good to pass up. It might help you fund your next home, take that big dream vacation you’ve always wanted or fund your kid’s education. That being said, if you have a medical condition that requires less stress, or a good night’s sleep, steer clear of on-call duties if you can.

If you’re still with me, congratulations! 

It looks like you’re either convinced or trying to convince yourself to opt into being on-call. Whatever your motivation, you should still do due diligence to ensure the organization has the resources to make your on-call experience reasonably good. 

Based on my experience and conversations with industry professionals, here are some metrics to gauge the on-call life you might be stepping into:

What are your SLAs and response times?

Service Level Agreements (SLAs) and response times are crucial. They define the maximum allowable downtime and the expected response time for critical issues. Understanding these metrics is very important as it will help you gauge the pressure and urgency you’ll face. 

If SLAs are stringent, it might be beneficial to discuss the support systems available to ensure you can meet these commitments. I delve into these systems further in the following paragraphs.

Try OnPage for FREE! Request an enterprise free trial.

Do they have an alert escalation policy?

An alert escalation policy ensures that if you don’t respond within a certain time frame, the alert gets escalated to another team member. Before committing to an on-call responsibility, you may want to check with your company if this policy exists to essentially distribute the on-call load and to ensure that no single point of failure exists within the team. It also means there’s some guardrail in place to prevent failure and you won’t be solely responsible for every alert, which can reduce stress.  

How will they alert you?

Now, this is a crucial question to ask before committing to on-call duty. How will you be alerted? Some companies have dedicated staff, such as Level 1 Network Operation Center staff (NOC), or third-party helpdesks, to notify engineers when an incident is detected. 

Whatever the mechanism, ascertain that they have technology that can reliably deliver notifications. The rationale is simple: an organization can have the most advanced detection system in place, but if it fails to invest in reliable alerting tools, it’s like a falling tree in a forest with no one around to hear it—if it didn’t grab someone’s attention, did it even ring? 

Reliable alerting tools are essential to ensure that you are always reachable when it matters most.

How often will you be on-call?

The frequency of on-call shifts varies significantly between organizations. Some companies might have a rotating schedule where you’d be on-call once a month, while others might require you to be on-call more frequently. Knowing this will help you assess how much of your personal time you’ll need to dedicate to on-call duties.

For instance, I came across this Reddit thread that illustrates the extreme end of on-call experiences. While I would like to believe that not all on-call roles are as challenging, for those about to signup on a situation like the one shown below, it underscores the importance of understanding what you’re signing up for:

Quality of alerts

I also can’t emphasize enough on how the quality of alerts serves as a crucial yardstick of the organization’s maturity in incident management. Are you joining a company that doesn’t prioritize the quality of alerts, leaving you to sift through an unnecessary burden of notifications in the middle of the night? A company with advanced incident management practices will be mindful of having the right tools and frameworks in place. These systems not only focus on resolving incidents faster but also on reducing false positives, ensuring you’re only alerted when necessary. Poor alert quality can lead to alert fatigue, where you become desensitized to notifications due to frequent false alarms. This desensitization can be dangerous, as it might cause you to miss or ignore critical alerts.

Compensation for being on-call

On-call compensation is a critical factor. Some companies offer additional pay for on-call shifts, while others might provide time off in lieu. Understanding your compensation helps you weigh the financial benefits against the personal sacrifices. For reference, check out these resources on on-call compensation laws:

I also recently discussed On-Call compensation in my other blog linked here.

Severity of downtime

The severity of potential downtime can vary widely. In some industries, downtime could be life-and-death, such as in healthcare or emergency services. In others, like e-commerce or finance, downtime might lead to significant financial losses. Knowing the stakes can help you understand the level of responsibility and urgency your role will entail.

Try OnPage for FREE! Request an enterprise free trial.

Guardrails for missed alerts

Life happens, and sometimes you might miss an alert. What safeguards are in place for such scenarios? Some companies have backup systems or secondary contacts to ensure that missed alerts don’t lead to catastrophic failures. This safety net can significantly reduce the stress of being on-call.

Volume of pages and MTTM:

The volume of pages that the on-call teams receive and the Mean Time to Mitigate (MTTM) are indicators of how demanding your on-call duties will be. High volumes of alerts might indicate underlying system issues, while a reasonable MTTM suggests that the team has effective processes in place to get to the bottom of an incident. Additionally, knowing if you can step away for personal reasons during on-call shifts can be helpful in deciding whether this role is for you.

Being on-call can be demanding and sometimes stressful part of a tech job, but it can also be rewarding and provide valuable experience. By asking the right questions and understanding what you’re signing up for, you can make an informed decision that aligns with your personal and professional goals.

Remember to evaluate the SLAs, alert mechanisms, compensations, and other critical factors before committing. Prioritize your well-being and consider how on-call duties will fit into your life. Each company’s approach to on-call varies, and finding the right balance is key to maintaining both your career and personal satisfaction.

As you contemplate whether to embrace the on-call life, I leave you with a food for thought: Are you prepared to navigate the challenges and opportunities it brings, and how will this decision impact your overall happiness and career growth?

Ritika Bramhe

Share
Published by
Ritika Bramhe

Recent Posts

OnPage’s Strategic Edge Earns Coveted ‘Challenger’ Spot in 2024 Gartner MQ for Clinical Communication & Collaboration

Gartner’s Magic Quadrant for CC&C recognized OnPage for its practical, purpose-built solutions that streamline critical…

1 day ago

Site Reliability Engineer’s Guide to Black Friday

Site Reliability Engineer’s Guide to Black Friday   It’s gotten to the point where Black Friday…

2 weeks ago

Cloud Engineer – Roles and Responsibilities

Cloud engineers have become a vital part of many organizations – orchestrating cloud services to…

1 month ago

The Vitals Signs: Why Managed IT Services for Healthcare?

Organizations across the globe are seeing rapid growth in the technologies they use every day.…

1 month ago

How Effective are Your Alerting Rules?

How Effective Are Your Alerting Rules? Recently, I came across this Reddit post highlighting the…

2 months ago

Using LLMs for Automated IT Incident Management

What Are Large Language Models?  Large language models are algorithms designed to understand, generate, and…

2 months ago