OnPage’s plan to hack IT post mortem reporting

Blameless post-mortems allow us to examine mistakes in a way that focuses on the situational aspects of a failure’s mechanism and the decision-making process of individuals proximate to the failure. – The DevOps Handbook

The engineers at Google describe post-mortem reporting as a “written record of an incident, its impact, the actions taken to mitigate or resolve it, the root cause(s), and the follow-up actions to prevent the incident from recurring.”

While the definition of a post-mortem makes it sound like a straight forward process, the simplicity can belie some important technical and managerial. The goal of this blog is to provide suggestions on the types of tools and frameworks that need to be introduced in order for IT, Ops or ITSM to institute an effective post mortem culture. To this end, we will look at the following points:

Why are post-mortems necessary
What do post-mortems allow us to achieve
How can we implement an effective post-mortem

WHY ARE POST MORTEMS NECESSARY?

Post mortems are necessary as they give us insight into why an incident happened. They allow us to deconstruct a particular incident and see what transpired after the critical event and how that can be improved in the future. Was the problem due to a scheduled or unscheduled incident? When the Sev1 incident occurred, was the right team notified? If the team was notified, did they actually hear the alert or did the alert just go off as a ping on their smartphone?

WHAT DO POST MORTEMS ACHIEVE?

Post mortems, when carried out correctly, can achieve a whole lot that advances the team in the direction of further progress and IT knowledge. The post mortems are designed to break down sacred cows and reveal points of truth that might not have been previously recognized. For example, when a service interruption was identified by the monitoring tool, was the incident only sent to one team member who then in turn needed to identify a number of other team members which slowed down the time until team members could respond?

STEPS TO HACK POST MORTEM REPORTING

Post mortems are both necessary and important to effective incident management as they bring to the surface how effective your team is at managing critical events. Effective post mortems are not meant to be blame games or cheap talk. Instead, they are meant as effective management tools to improve the effectiveness of the team.

Hack 1: Enable post mortems as soon after the event as possible

Memories are shaky. So it is best to enable the post mortem as soon after the event as possible. Team leaders need to be rigorous about recording details and sharing information

Hack 2: Create a timeline

If you don’t have things written down, it can be hard to follow up on action items.

The first point of action of the post mortem meeting should be to look at the timeline of events. As you were smart and invested in an incident alert management system, a communications management, a ticket management platform and a reporting platform, you have all the relevant data you need to view the order in which the events unfolded. The first three tools allow you to see what happened in a step by step manner. With the reporting capabilities, you will be able to see aggregate data that provides context to the timeline.

Hack 3: Create a final digital record

Important to share this information and make it easily available. Need to publish post mortems as widely as possible. Google drive is a good place to post this information. You need to educate other members of the team as to why the event occurred and commit to changes that will prevent the event from happening again in the future.

Post mortem reporting is an important component of effective DevOps teams. To read 3 more post mortem reporting hacks, download our whitepaper

Facebook

Google

Twitter

OnPage Corporation

Next How to create a service level agreement that helps retain customers »

Previous « OnPage Report: HIPAA Compliant Messaging Myths Dispelled

Published by

OnPage Corporation

9 years ago

OnPage vs PagerDuty for MSPs: Which On-Call & Escalation Platform Wins?
Picking on-call software for a managed service provider is not the same as picking it…
Top IT Ticketing & SOAR Tools for Automated Workflows
Why Automated Workflows Are a Non-Negotiable for Modern IT & Security Teams For IT and…
The Hidden Cost of AI Productivity: When Efficiency Turns Into “Brain Fry”
A new HBR study reveals that the race to build and manage AI agents may…

5 Reasons OnPage Tops the Best HIPAA Messaging Apps List

Choosing a HIPAA-compliant messaging app is rarely about security alone. Healthcare teams need messages that…

5 days ago

clinical communication and collaboration

7 Secure Medical Messaging Apps Private Practices Trust in 2026

For private medical practices in 2026, secure and efficient communication is non-negotiable. Standard consumer messaging…

6 days ago

OnPage vs PagerDuty for MSPs: Which On-Call & Escalation Platform Wins?

Picking on-call software for a managed service provider is not the same as picking it…

1 week ago

Alert Fatigue

How to Reduce On-Call Burnout in IT Teams

On-call duty is a high-stakes reality in modern IT and digital ops teams. While essential…

2 weeks ago

incident response

Top Mobile Incident Notification Systems for IT Teams 2026

Modern IT incidents don’t stick to a 9-to-5 schedule. System failures, security breaches, and performance…

2 weeks ago

press release

Route Critical Alerts Evenly and Move Faster from Message to Phone Call

Round-Robin Alert Distribution and Tap-to-Phone Call help teams manage on-call, incident alerting and secure communication…

3 weeks ago