Are you prepared for an IT Outage?

In the past 4 weeks, the southern United States has experienced significant damage from the ravages of hurricane Harvey and Irma. These natural disasters are freakish and unexpected but unfortunately they are not uncommon. Even at less intensity such as a tropical storm, the results can take the form of significant damages and IT outages. Given the intensity of these storms, IT outage communications cannot be ignored

The question then becomes how to prepare for the unexpected? What are the best practices and tools teams need in order to handle a catastrophe effectively? While it might seem like an oxymoron, effectively handling a catastrophe is just what your IT team might need to do.

The following are the some of the steps your team will need to follow to prepare for a serious outage or disastrous event.

Step 1: Categorize Alerts

Not all alerts are created equal. As such you need to determine which outages are high priority and which are low priority. This step is essential so that you can prioritize your efforts and make sure you focus on the most important issues first and leave non-critical issues to be handled at a less hectic time. Priorities will change based on the business model at hand so they won’t necessarily be the same for any two businesses.

Determine which metrics you will use to define the significance of the issue.

  • High priority issues are ones involving a critical situation that must be resolved immediately. These critical issues can be ones that impact the IT’s overall availability as well as the end customer. These high priority issues might be ones such as downed servers, downed infrastructure or inability to access key information in the cloud.
  • Low priority messages can typically be responded to with some delay without facing consequences. These low priority issues might be ones such as customers experiencing latency or a bug the has limited impact on usability.

Step 2: Establish protocols

Your IT team needs to have a practiced game plan around who on the team will get alerted and how they will respond.

  • CREATE AN ON-CALL SCHEDULE: When disaster strikes, it shouldn’t be a guessing game as to who is notified. There needs to be a pre-defined disaster team who knows that they are the first line of defense, no matter if the technology belongs to their company or to a client. The team and back-up responders need to have their names in a digital schedule so that when an alert comes in, it can be forwarded to the right IT engineer on call right away.
  • RUN BOOKS: When facing significant infrastructure failures, IT teams might not always know the correct protocols for resolving the issue. By having predefined run books with instructions for how to handle typical eventualities, much of the stress can be relieved from trying to resolve important outages.

Step 3: Tools for your team

With any disaster, it is important that your teams have the proper tools to receive alerts and respond appropriately. Your team needs to get alerted immediately when a critical event occurs so that they can spring into action.

Alerting & IT Outage Communications

  • INTEGRATIONS: Make sure your monitoring tool is integrated with an effective alerting tool that sends the critical notification as a persistent alert to a smartphone rather than as an SMS or email.
  • PERSISTENT ALERTS: Ensure that there are persistent alerting capabilities attached to your alerting mechanism. Know that if the alert is not heard when it is first delivered to the person on-call that it will persist until it is responded to. Ideally, you will want your alerts tied to a digital on-call schedule so that the proper engineer is alerted in case of a disaster.
  • COMMUNICATIONS: IT outage communications are perhaps the most important element during a disaster situation. If you are unable to communicate with your colleagues or your customers are unable to communicate with you, then the serious outage has just been magnified. Phone lines are easily downed during storms so communications need to occur over smartphones. By using the proper smartphone applications, your team can receive alerts from technologies, customers and colleagues. More importantly, they can use texting capabilities found on smartphone apps like OnPage that mark if a message has been received and read. With this knowledge of a message being read, team members know when their colleagues have received a message or if they need to escalate a request for help.

Conclusion

These insights give a high level overview of what your team might need to do in order to prepare for a serious IT outage.

However, there is much more to learn about preparation. TO get the full scoop, please download our e-book: IT Outage Communications – 4 Rules for Your IT Team To Live By

OnPage Corporation

Share
Published by
OnPage Corporation

Recent Posts

OnPage’s Strategic Edge Earns Coveted ‘Challenger’ Spot in 2024 Gartner MQ for Clinical Communication & Collaboration

Gartner’s Magic Quadrant for CC&C recognized OnPage for its practical, purpose-built solutions that streamline critical…

2 days ago

Site Reliability Engineer’s Guide to Black Friday

Site Reliability Engineer’s Guide to Black Friday   It’s gotten to the point where Black Friday…

2 weeks ago

Cloud Engineer – Roles and Responsibilities

Cloud engineers have become a vital part of many organizations – orchestrating cloud services to…

1 month ago

The Vitals Signs: Why Managed IT Services for Healthcare?

Organizations across the globe are seeing rapid growth in the technologies they use every day.…

2 months ago

How Effective are Your Alerting Rules?

How Effective Are Your Alerting Rules? Recently, I came across this Reddit post highlighting the…

2 months ago

Using LLMs for Automated IT Incident Management

What Are Large Language Models?  Large language models are algorithms designed to understand, generate, and…

2 months ago