MSP

Incident Alert Management for MSPs

Please schedule a more convenient time for your IT breakdown

Incidents that could hurt business never happen at a convenient time. So it makes sense for MSPs in charge of these businesses’ IT infrastructure to move alerting to their smartphones. MSPs may forget to eat breakfast or even sleep but chances are that they are glued to their smartphones. When this is the case, the best medium to deliver an alert becomes the smartphone.

Take a look at the following anatomy of an incident to see a moment in the life of an MSP and their smartphone.

First, you have the incident itself. Someone’s pet monkey has broken into the server room and pulled out all the wires! Steps to follow?

  • Send a memo to employees, canceling bring-your-pet-to-work-day. They will probably read this message on their smartphone.
  • Inform your on-call team of the monkey business. Through your company’s automated alerting process that is integrated with your monitoring tools and sensors, an important notification will create an alert that is audible on the MSP’s smartphone.

Second, you mobilize your team. If you previously relied on an underpaid intern making calls to the on-call team, then you are doing it wrong. You need to assign team members into an escalation group with automated alerts.

Third, you need to have a plan B in place if the first person your frantic intern calls is you. You are at an obnoxiously loud concert, so you want to make sure there is a backup on-call engineer because you can’t hear the alerts. The reliable ones on your team (clearly, not you) get an alert because they are in the escalation group. The order in which MSPs are alerted can be adjusted along with the time between escalations. Make sure that if an incident is not acknowledged or resolved within a pre-determined amount of time, it will be escalated to the next person on call.
In the event a message is sent to an escalation group and does not reach anyone in the escalation group, make sure you have failover options.

Fourth, the alerts have been sent out and now your on-call team has several options available to them. They can send and receive messages that include images and voice attachments to enrich the alert message. These features can be used to describe the incident further. All of this is completely secure of course and works over cellular or wireless (Wi-Fi) coverage.

Fifth, take action. Now that the escalation has moved to the rest of your team members, they can collaborate to fix the incident by sending high and low-priority messages. High-priority messages could be about how to solve the incident. Low-priority messages could be reserved for discussing how much they hate you. Your team can also acknowledge that they have fixed the issue using pre-defined reply options built into the app and tracked by our audit trail.

Sixth, it’s the day of reckoning. Every single thing you and your team did during the outage has been cataloged using audit trails. Your ignoring of the alert while posting concert pictures to Facebook was not a good idea. With the audit trail, your boss knows every alert that went out and who responded.

Imagine if this scenario was real. Wouldn’t you want to make sure you had technology on your side that was robust enough:

  • To handle on-call scheduling
  • To enable escalation of alerts
  • To enable communication among team members
  • To enable alert tracking through audit trails
  • To generate fail-over reports

You could try to search for this technology on your own or you could try OnPage.
Contact us for more information on how we can fix your monkey business.

Shawn Lazarus

Share
Published by
Shawn Lazarus

Recent Posts

From Tickets to Action: Ensuring Proactive IT Support with Jira and OnPage

We're excited to announce the launch of our bi-directional integration between OnPage and Jira! This…

5 days ago

OpsGenie End of Life? What’s next for OpsGenie users.

If you haven’t heard already (which would be shocking considering the numerous posts I’ve seen…

6 days ago

Reflections from HIMSS 2025: Conversations, Challenges & The Future

HIMSS 2025 is in the books, and after days of conversations, sessions, and navigating the…

3 weeks ago

The Need for Full-Stack Observability

In a recent survey, it was discovered that 57% of software developers' time is spent…

3 weeks ago

From Beeps to Breakthroughs: How Mobile Apps are Taking Over Pagers in Healthcare

In recent years, the healthcare industry has been facing a pivotal shift on the communication…

4 weeks ago

Why OnPage Outperforms Epic Secure Chat for Critical Communication

Electronic Health Records (EHRs) like Epic are undoubtedly pivotal to modern healthcare. With their intuitive…

1 month ago