Please schedule a more convenient time for your IT breakdown
Incidents that could hurt business never happen at a convenient time. So it makes sense for MSPs in charge of these businesses’ IT infrastructure to move alerting to their smartphones. MSPs may forget to eat breakfast or even sleep but chances are that they are glued to their smartphones. When this is the case, the best medium to deliver an alert becomes the smartphone.
Take a look at the following anatomy of an incident to see a moment in the life of an MSP and their smartphone.
First, you have the incident itself. Someone’s pet monkey has broken into the server room and pulled out all the wires! Steps to follow?
Second, you mobilize your team. If you previously relied on an underpaid intern making calls to the on-call team, then you are doing it wrong. You need to assign team members into an escalation group with automated alerts.
Third, you need to have a plan B in place if the first person your frantic intern calls is you. You are at an obnoxiously loud concert, so you want to make sure there is a backup on-call engineer because you can’t hear the alerts. The reliable ones on your team (clearly, not you) get an alert because they are in the escalation group. The order in which MSPs are alerted can be adjusted along with the time between escalations. Make sure that if an incident is not acknowledged or resolved within a pre-determined amount of time, it will be escalated to the next person on call.
In the event a message is sent to an escalation group and does not reach anyone in the escalation group, make sure you have failover options.
Fourth, the alerts have been sent out and now your on-call team has several options available to them. They can send and receive messages that include images and voice attachments to enrich the alert message. These features can be used to describe the incident further. All of this is completely secure of course and works over cellular or wireless (Wi-Fi) coverage.
Fifth, take action. Now that the escalation has moved to the rest of your team members, they can collaborate to fix the incident by sending high and low-priority messages. High-priority messages could be about how to solve the incident. Low-priority messages could be reserved for discussing how much they hate you. Your team can also acknowledge that they have fixed the issue using pre-defined reply options built into the app and tracked by our audit trail.
Sixth, it’s the day of reckoning. Every single thing you and your team did during the outage has been cataloged using audit trails. Your ignoring of the alert while posting concert pictures to Facebook was not a good idea. With the audit trail, your boss knows every alert that went out and who responded.
Imagine if this scenario was real. Wouldn’t you want to make sure you had technology on your side that was robust enough:
You could try to search for this technology on your own or you could try OnPage.
Contact us for more information on how we can fix your monkey business.
Gartner’s Magic Quadrant for CC&C recognized OnPage for its practical, purpose-built solutions that streamline critical…
Site Reliability Engineer’s Guide to Black Friday It’s gotten to the point where Black Friday…
Cloud engineers have become a vital part of many organizations – orchestrating cloud services to…
Organizations across the globe are seeing rapid growth in the technologies they use every day.…
How Effective Are Your Alerting Rules? Recently, I came across this Reddit post highlighting the…
What Are Large Language Models? Large language models are algorithms designed to understand, generate, and…