What is it?

At its core, enterprise incident management is the process of providing incident management processes and technology to a specific team or business. IT often faces incidents which have the potential to disrupt or waylay the team or the company. To successfully and quickly resolve these critical incidents, teams need to have a carefully considered plan in place ahead of time.

Enterprise incident management brings together processes such as ticketing, alerting, escalations, reporting and documentation to the incident response team and the company as a whole so that there is a sustainable and repeatable process to minimize time to incident resolution.

 

Enterprise incident management

Alerts

Purpose of Enterprise Incident Management

The main goal of the enterprise incident management process is to restore normal service operations to the enterprise as quickly as possible. By doing so, the company will minimize the adverse impact of outages on the business and ensure that the optimal level of service quality is maintained. This optimal level of service is precisely defined in the service level agreement (SLA).

What is the Enterprise Incident Management Process?

1) Create a Service Level Agreement (SLA)

The IT organization creates an SLA with the collaboration and input of other departments. Included in a comprehensive SLA is a defined path for incident priorities, escalation paths and response times.

Create a service level agreement (SLA)

2) Identify and log Incidents

When an incident occurs, it is identified and logged in a ticketing system so a record is kept. Ideally, the ticket will be updated along the way as the team works to resolve the issue.

An incident is identified and logged

3) Use Templates to Categorize the Issue

The ticket is categorized according to type. For example, the ticket might be defined as a server issue or a networking issue.

categorize the issue

4) Prioritize the Issue Based on Severity and Impact on the Business

High-priority issues are prioritized above other issues based on the significant financial impact they have on the business. Low-priority issues are ones that have minimal financial impact and thus are typically resolved after high and medium priority issues.

issue is prioritized

5) Escalate the Issue if More Technical Expertise is Required

An escalation process is used when the team receiving the alert needs to call in assistance from other groups in the organization, requires a level expertise that is not available within the core incident response team or is unable to resolve the issue on their own.

Issue is escalated

6) Investigate and Diagnose the Issue

By using messaging between team members along with pre-established runbooks and monitoring solutions, IT professionals are able to rapidly investigate and diagnose the incident.

Investigation and diagnosis

7) Resolve the Issue and Recover Service

Once the issue has been diagnosed, it can be resolved and service levels can return to their expected level of performance.

Resolution and recovery

8) Close the Incident

The incident is reported as closed, typically through the ticketing system.

Incident closure

9) Survey Internal Customer and Conduct a Post-Mortem Review

By bringing in a step for reflection on the process, teams are able to review the processes and steps they took to resolve the issue and see what can be done better next time.

Each of these steps is important in the creation of a clear incident management process. Skipping the steps in an attempt to resolve the issue more quickly can easily lead to overwhelming IT teams and hurting SLAs.

Customer survey

OnPage for Enterprise Incident Management

OnPage can be implemented enterprise-wide. Consolidate all enterprise alerts on to one incident management system hosted in a secure, SSAE-16 compliant hosting facilities across the U.S. Handle enterprise-wide communication through built-in team messaging.

  • Fragmented teams are no longer a problem! The intuitive built-in messaging allows for the entire ticket details to be forwarded. Get full event visibility!
  • Add notes, a conference bridge number, attachments and predefined message templates to the event alert.
  • OnPage “Alert-Until-Read” ensures that critical alerts are never missed.
  • Follow the audit-trail to ensure a notification was read and replied to.
  • The fault-proof scheduler defaults to “always full” (i.e., if a person is removed from an on-call shift by mistake with no replacement, the entire team will be alerted to ensure the alert is delivered).

OnPage provides powerful integrations with mission critical systems through the industry’s easiest integration framework.

 

Complete OnPage System

 

 

How it Works

OnPage