Speed is of the utmost importance for IT incident response teams and the mean time to resolution (MTTR) is the metric that’s used for measurement. If an IT team doesn’t know how long it takes to fix issues, they can’t improve performance.
There are many roadblocks to minimizing MTTR, including:
> Inconsistent data channel connectivity: As an example, let’s say there’s an IT team in India as well as in the U.S. The U.S.-based team should complement the hours not worked in India and vice-versa. Yet, due to the high cost of the data channel, the team in India turns their data channel off and is only reachable if they are in the office. Since the India team is delayed in receiving and responding to messages, MTTR increases.
> Lack of effective monitoring tools: Without quality monitoring solutions and processes, it will take more time than necessary to do root cause analysis of the incident. Techs can also use monitoring tools to see the change in data as they apply fixes and tweaks to ensure that they are headed in the right direction toward resolution.
> No escalations: When an engineer is alerted of a critical incident, he or she may want to escalate the issue if the scope of the problem is larger than originally anticipated. Often, effective resolution of problems requires bringing in other members of the team to resolve an issue and if there’s not a fast way to alert the team or determine who’s available, the incident will take much longer to fix.
> Lack of audit trails: If no trail exists of who was alerted based on what criterion, management is unable to see incident reports with a history of the cause of the most recent alert and who was notified and in which order. This is a missed opportunity to help the IT team discuss their performance during a post-mortem review and work on continuously improving MTTR.
> No scheduling tools: Management cannot coordinate who’s to be alerted based on the type of incident. Instead, the whole team is alerted regardless of their ability to provide insight or assistance.
> Excessive alerting: The team receives too many false positives, inevitably begins to ignore alerts and eventually starts to miss important ones. Alert fatigue not only affects MTTR, but also leads to employee burnout and high employee churn rates in the IT organization.