The Seven Deadly Sins of DevOps

What to avoid when you start DevOps While there are many ways to do DevOps correctly, there are specific cardinal sins that will put you afoul of the Church of DevOps. In order to achieve excellence in DevOps, it is key for executives to avoid committing the cardinal sins of DevOps that are discussed below. DevOps sin 1: You treat DevOps as a title, not a philosophy In speaking to directors of engineering at numerous companies, I have heard the phrase: ‘if you have Devops in your title, you’re doing it wrong’. The point of this statement is that DevOps is a philosophy, not a title. You shouldn’t assume that you can simply put the word ‘DevOps’ in someone’s title and get anywhere near implementing a DevOps-focused enterprise. As Matt Juszczak of Bitilancer writes: Calling yourself or somebody else a “DevOps Engineer,” a “DevOps System Administrator” or a “DevOps Tester” […] Read more »

Netflix earnings, DevOps and profitability


DevOps as the road to profitability Netflix released a great earnings report earlier this week. According to the Wall Street Journal’s page one article on October 18th, “Netflix Inc. blew through its forecast for the subscriber additions in the September quarter…sending its shares soaring 20% in after-hours trading. … The better-than-expected performance came mainly in international markets, where the company has completed a massive, near global expansion this year.” For anyone who reads the DevOps literature, this success doesn’t come as a surprise. Rapid testing and provisioning is the name of the game at Netflix. Puppet’s 2016 State of DevOps Report, notes that Netflix is among the top DevOps performers: High [DevOps] performers deploy on demand, with Etsy deploying 80 times per day, and large companies like Amazon or Netflix deploying thousands of times per day. So how tightly are the components of profitability and DevOps excellence intertwined? Perhaps another […] Read more »

Serverless promises and the persistent need for critical alerting

critical alerting and serverless computing

Why serverless computing doesn’t end the need for security or alerts Serverless computing provides the advantage of taking away the problem of managing servers. For many small start-ups, this is a huge advantage as the cost of purchasing, maintaining and scaling servers is a real pain point. Serverless also holds forth the prospect of ending the need for Ops as we know it, ending the need for security worries and ending the need for being on-call. But, while this modern-day DevOps marvel known as serverless might seem like a panacea, serverless computing needs to come with a healthy dose of reality. The reality of serverless In an article I recently posted to DZone entitled How Smart Is Serverless, I question how smart it is to outsource your security concerns to a third party like AWS. As I note in the article, you cannot abstract security without facing some pretty scary consequences. Amichai […] Read more »

Why Serverless Still Needs Critical Alerting

critical alerting with serverless

NoOps eschews critical alerting at its own peril Many start-ups’ embrace serverless architectures such as AWS, believing they will be able to adopt NoOps. NoOps means no worries about servers as everything is on the cloud and if there are no worries about servers then there is no need to worry about critical alerting. The reality is slightly different. No matter how minimized Ops becomes, there will always be a need for strong incident management applications. The emphasis will simply further push monitoring from an Ops-only role to an important role for everyone on the development team. What is NoOps and why is there so much criticism? NoOps defines an IT environment that is so automated and abstracted from the underlying infrastructure that there is no need for a dedicated team to manage Ops in-house. The two main drivers behind NoOps are increasing IT automation and cloud computing. Even among […] Read more »

Constant Vigilance in Continuous Delivery – How to do DevOps right

Alastor Moody's mantra of  "constant vigilance" is applicable to DevOps.

The importance of monitoring and alerting in the continuous delivery cycle In the 2016 State of DevOps report, Puppet reported that the top DevOps shops like Amazon or Etsy deploy new software releases multiple times per day. The next tier of companies deploy on a weekly or monthly basis. What is the difference between High and Medium IT performers? More than just tools, it is mindset. In addition to building code, the top DevOps teams have adopted a mindset of “constant vigilance”, as Mad-Eye Moody said to Harry Potter.  Not only should teams always be building but they should also always be testing. Through testing they should remain constantly vigilant – attuned to their software and its performance. And equally important in this process should be incident alerting to let both Dev and Ops know when things go awry. A cautionary tale: What happens without constant vigilance While many DevOps […] Read more »

The Secret to Making Your DevOps Team World Class

linkedin image dev ops team

Continuous deployment is key to world class DevOps With their State of DevOps report released at the beginning of the summer, Puppet clearly defined the characteristics of world class DevOps organizations and the make-up of those lagging behind. According to Nigel Kersten, CIO of Puppet, there is a huge gap between organizations that get DevOps and are able to ship software on demand and “organizations that take days, weeks or even years to ship simple upgrades … and the gap is widening”. Where is your company on the spectrum? Is your company deploying 80 times per day like Etsy or thousands of times per day like Amazon? Is your company one of those that spends 50% less time remediating security issues than low performers, and 22% less time on unplanned work? How much time does your team have for building new code? Perhaps you don’t even know the exact answer […] Read more »

What you need to know about MTTR and why IT MaTTeRs


What all engineering teams should know about MTTR In the IT world, performance is everything. So when technology fails, your first thought is how to utilize incident management knowledge to repair the situation and minimize downtime. As both a manager and an engineer, you need to minimize your MTTR –Mean Time To Resolution- in order to comply with your SLAs – service level agreements – and keep your group at the top of its game.  This article will highlight the issues impeding effective MTTR management and offer insights on how to improve use of MTTR as a metric. Who cares about MTTR I have put the importance of MTTR out there and have not defined to whom in particular the metric is important. But the truth is that just about everyone in engineering uses MTTR to measure how long it takes their teams to resolve an incident after it has […] Read more »

The secret to blameless post mortems

blameless post

How your engineering teams can move past finger-pointing to effectively managing mistakes Sidney Dekker’s theory on ‘bad apples’ holds that complex systems think they would be fine if it were not for the erratic behavior of some unreliable people. According to this theory, when unexpected events are seen in an otherwise safe system, they are typically and conveniently assigned to “human error” and when they are severe to “operator carelessness”. Similarly, post mortems often look to define and parcel out blame to engineers. Yet it begs the question of how effective the post mortems are if their only purpose is to assign blame. Instead, effective post mortems needs to “acknowledge the human tendency to blame, to allow for a productive form of its expression, and constantly refocus the postmortem’s attention past it.” Post mortems vs retrospectives The problem with post mortems begins with its name “post mortem”, which if you ask […] Read more »

Feel the burnout

Everyone on your team is feeling the pain.

Eleven practical ways for DevOps engineers to better manage their work environment At last month’s DevOpsDays Boston, many hallway sessions and Open Spaces discussions were devoted to talking about engineer burnout. At OnPage, we are focusing on this important topic through numerous formats in addition to this blog such as our e-book and video. We realize that the seriousness of the issue is highlighted in the following components: Decreased employee happiness. Employees become less satisfied and content with their work Decreased productivity. Because employees are fatigued, they are less productive Frequent job shifts. Throughout the industry, it has become standard for engineers to switch jobs every 2 to 3 years in hopes of finding employment that won’t burn them out. How to recognize burnout How do you realize that you are suffering from burnout? It’s like the famous description of a frog in boiling water. The frog only knows he’s going […] Read more »

Bringing Dev and Ops together with on-call groups

on call scheduler

Make Dev and Ops better together by building empathy with on-call groups   Create Effective Schedules Much has been written on the tension that often exists between Dev and Ops teams in an organization. All too frequently, Devs are focused on rapid prototyping and creating code while Ops are focused on keeping the ship stable and making as few changes as possible. When I was at the DevOps Boston Conference last week, much of the “hallway conference” was devoted to conversations on how to build empathy between these frenemies and make them exist in less opposition to one another. How can Dev and Ops become less siloed? How can management encourage cross pollination? One important psychological realization was that in order to create empathy between these two groups and ensure an effective group dynamic, the teams need to spend more time living in one another’s shoes. One strong and significant step that can […] Read more »

7 Ways DevOps Can Avoid Alert Fatigue

7 ways to avoid alert fatigue

Being on-call doesn’t have to mean you’re always tired The introduction of monitoring into the DevOps world means alerts will occur 24/7 and that there will be alert fatigue in DevOps. Monitoring needs alerts in order to be effective but the issue is that while our technology is 24/7, humans cannot work in a similar fashion. Even if engineers do attempt to push at the margins and be on-call longer and later, there are considerable health, psychological and work-related effects. Even with on-call schedules, burnout is inevitable. There are also significant financial implications for companies if they have stressed, unhappy and sleep deprived engineers. For example, engineers who are feeling the stress of alert fatigue are likely to leave for greener pastures, leaving their employers without their knowledge reservoir and needing to rehire which can cost as much as 30% of the individual’s salary. Clearly, 24/7 alerts need to be […] Read more »