Why the conversation can’t stop at DevOps Monitoring Tools

On Beyond Tools

A conversation I recently had with the DevOps manager of a major online retailer really made me think about DevOps monitoring tools. The manager and I discussed how several DevOps shops seem to define themselves based on the number of tools they have monitoring their build and IT stack. The point he went on to make is:

You can go up and down the isles at a conference with the corporate credit card and buy every tool in sight but all those purchases don’t make you a DevOps. All it makes you is the owner of many tools.

The point of the manager’s comment is that being an effective DevOps shop or IT service provider means you go beyond just owning tools. You have to incorporate those tools into a meaningful DevOps philosophy and an understanding of proper tool management and proper team integration. And, importantly from my humble perspective, proper alerting.

DevOps, as a philosophy, encourages shifting left and putting testing earlier into the process so that teams can be proactive in their support rather than reactive to problems. So, how does a DevOps enable this shift in thinking from reactive to proactive? Read on to find out.

Devops monitoring tools – a love affair

Devops is about bringing development and operational teams together. And to some extent, tools can be a way to improve this relationship. A recent whitepaper from Puppet describes how:

Adopting DevOps practices usually means embracing automation as a default solution to many problems.

And indeed every developer or ops loves their shiny new toys. Tools do allow for faster builds, quicker deployment, greater visibility and faster feedback.

Puppet, for example, can be used for greater server configuration and configuration management. Nagios is also a favorite for infrastructure monitoring. Jenkins can be used to build code, create Docker containers and push code to production. Jenkins is also great for continuous integration. Many enjoy the integration provided by our friends at Logz.io because it collects logs from all services, applications, networks, tools, servers, and more in an environment into a single, centralized location for processing and analysis.

Yet these tools, as strong as they are at dealing with reams of data, do not alert the end user, be it Dev or Ops, when a real issue arises. For the most part, they will not solve underlying issues that arise in in any operation such as failed deployments, security issues or scaling problems. Instead, those types of issues need to be alerted on and responded to appropriately.

Don’t forget the alerting

If DevOps were just to rely on their tools, they would be left in a position where they were always reacting to situations rather than being proactive. Metric provided by all the shiny devops tools enable us to measure and observe various components of the operation. But it is alerting that draws attention to the particular systems that require observation, inspection, and intervention. It is alerting that furthers proactive management.

By putting alerting earlier in the monitoring process, DevOps teams take the true meaning of shift left to heart. Teams can see early on when software doesn’t deploy as expected by alerting the proper team members. Similarly, security vulnerabilities can be detected early on and alert the engineers who can react appropriately and intervene.

Not all alerts are created equal

Even though most DevOps teams have adopted alerting practices, they are often far from alerting best practices. It’s not enough to just have an alerting tool. Like a monitoring tool, if left uncalibrated, alerts will simply produce a sea of noisy data. Instead, teams should calibrate alerts so that they are meaningful.

For example, a meaningful alert might be something along the line of web requests are taking more than x seconds to process and respond or new servers are failing to spin up as expected. And these are great examples of what could be high priority alerts for a company. The Ops team, in these cases, can then investigate based on specific information rather than complaints from end users.

Alternatively, alerts that are less high priority, such as server is 90% full can be a low priority alert that can be forwarded to the on call engineer but don’t rise to the level of a 2am wakeup call. In OnPage, you can send this low priority alert to go to the engineer’s account but ensure the account notifies the engineer during normal business hours.

6 steps to alerting best practices

It’s an important realization that not all alerting needs to wake up an engineer. Successful adoption of DevOps means planning ahead and providing meaningful alerts when issues do occur. To this end, OnPage has the following alerting best practices which have been vetted by our numerous end users:

Make sure your alerts are calibrated. Establish a baseline so you know how your systems are supposed to work
Ensure alerts are tied to a schedule. As weird as it sounds, some shops just alert everyone. You never want to alert everyone. Make sure your alerts are tied to a schedule so that one person is alerted. If the engineer is unavailable, then escalate to the next person on call.
Ensure alerts are actionable. Who wants to be woken up to a message that is pointless such as there’s a problem with deployment in the test environment. Instead, ensure alerts have a direct piece of information that needs to be investigated and resolved.
Develop run books. Publish operating procedures so on-call can become more standardized.
Review audit trails. Make sure alerts went to the right person on the team who is best able to resolve the issue
Review on call at weekly meetings. Review alerts that were received during the week to ensure sufficient information is arriving with alerts and that alerts are actionable. If they are not, then alter the alert messaging so it is more effective.

By following these steps your DevOps team will begin the process towards thinking from a proactive rather than a reactive position.

Conclusion

DevOps monitoring tools are powerful instruments. However, the devops monitoring tools need to be attached to proper alerting tools and procedures to enable proactive engineering. OnPage’s cloud based alerting tool is a powerful tool to ensure the right information gets to the right engineer at the right time.
See what proper alerting can do to help your team’s monitoring. Schedule a demo with OnPage today.

Facebook

Google

Twitter

OnPage Corporation

Next 6 steps to IT alerting best practices »

Previous « OnPage on call scheduler

Published by

OnPage Corporation

8 years ago

Top IT Conferences 2025-2026
Top IT Conferences of 2025 IT conferences offer valuable opportunities to build lasting partnerships and…
Top MSP Conferences of 2025
Conferences are a great way for MSPs of all sizes to establish long-term partnerships and…
Navigating AI in Security Operations
Introduction Many modern cybersecurity attacks are now being powered by AI, demanding heightened vigilance and…

From Tickets to Action: Ensuring Proactive IT Support with Jira and OnPage

We're excited to announce the launch of our bi-directional integration between OnPage and Jira! This…

3 days ago

critical communication and alerting

OpsGenie End of Life? What’s next for OpsGenie users.

If you haven’t heard already (which would be shocking considering the numerous posts I’ve seen…

4 days ago

clinical communication and collaboration

Reflections from HIMSS 2025: Conversations, Challenges & The Future

HIMSS 2025 is in the books, and after days of conversations, sessions, and navigating the…

3 weeks ago

IT Alerting

The Need for Full-Stack Observability

In a recent survey, it was discovered that 57% of software developers' time is spent…

3 weeks ago

clinical communication and collaboration

From Beeps to Breakthroughs: How Mobile Apps are Taking Over Pagers in Healthcare

In recent years, the healthcare industry has been facing a pivotal shift on the communication…

4 weeks ago

Healthcare thought-leadership

Why OnPage Outperforms Epic Secure Chat for Critical Communication

Electronic Health Records (EHRs) like Epic are undoubtedly pivotal to modern healthcare. With their intuitive…

1 month ago