Alert fatigue, or alarm fatigue is one of the most common challenges facing IT teams, DevOps engineers, and managed service providers (MSPs) today. When dozens (or even hundreds) of alerts arrive everyday, it becomes harder to separate the critical issues from the noise. Engineers miss sleep, teams lose focus, and sometimes the most urgent problems … Continued
Seven steps to failure and greatness The more I read and learn about how to succeed in DevOps the more I realize how important failure is to the process. You need to fail to be great at DevOps. Netflix, for example, even takes it a step further by introducing failure into their testing process. In … Continued
Tools going Rogue – a story for Halloween We have all heard stories of DevOps woe. Some tales are sad. Some tales describe true misfortune. And some tales just leave you thinking what the heck were developers thinking? This story is a tale of the later. This story will tell the tale of how some … Continued
What to avoid when you start DevOps While there are many ways to do DevOps correctly, there are specific cardinal sins that will put you afoul of the Church of DevOps. In order to achieve excellence, it is key for executives to avoid committing the cardinal sins of DevOps that are discussed below. DevOps sin … Continued
DevOps as the road to profitability Netflix released a great earnings report earlier this week. According to the Wall Street Journal’s page one article on October 18th, “Netflix Inc. blew through its forecast for the subscriber additions in the September quarter…sending its shares soaring 20% in after-hours trading. … The better-than-expected performance came mainly in … Continued
Why serverless computing doesn’t end the need for security or alerts Serverless computing provides the advantage of taking away the problem of managing servers. For many small start-ups, this is a huge advantage as the cost of purchasing, maintaining and scaling servers is a real pain point. Serverless also holds forth the prospect of ending … Continued
The importance of monitoring and alerting in the continuous delivery cycle In the 2016 State of DevOps report, Puppet reported that the top DevOps shops like Amazon or Etsy deploy new software releases multiple times per day. The next tier of companies deploy on a weekly or monthly basis. What is the difference between High … Continued
Chat your way to excellence DevOps is constantly trying to improve production through automation, collaboration and tools. ChatOps is often the paradigm which brings these tasks together into a single conversation. In ChatOps, “chat applications and tools for real-time communication and task execution [are distributed] among members of development and IT operations teams”. Yet often … Continued
When an incident strikes, every second counts. MTTR, or Mean Time to Respond, measures how quickly your team reacts once a problem is detected. It’s one of the most important metrics in incident management because the faster you respond, the faster you can contain and resolve critical issues. In this guide, we will explore what … Continued
How your engineering teams can move past finger-pointing to effectively managing mistakes Sidney Dekker’s theory on ‘bad apples’ holds that complex systems think they would be fine if it were not for the erratic behavior of some unreliable people. According to this theory, when unexpected events are seen in an otherwise safe system, they are … Continued