The Secret to Making Your DevOps Team World Class

Continuous deployment is key to world class DevOps With their State of DevOps report released at the beginning of the summer, Puppet clearly defined the characteristics of world class DevOps organizations and the make-up of those lagging behind. According to Nigel Kersten, CIO of Puppet, there is a huge gap between organizations that get DevOps and are able to ship software on demand and “organizations that take days, weeks or even years to ship simple upgrades … and the gap is widening”. Where is your company on the spectrum? Is your company deploying 80 times per day like Etsy or thousands of times per day like Amazon? Is your company one of those that spends 50% less time remediating security issues than low performers, and 22% less time on unplanned work? How much time does your team have for building new code? Perhaps you don’t even know the exact answer […] Read more »

What you need to know about MTTR and why IT MaTTeRs

What all engineering teams should know about MTTR In the IT world, performance is everything. So when technology fails, your first thought is how to utilize incident management knowledge to repair the situation and minimize downtime. As both a manager and an engineer, you need to minimize your MTTR –Mean Time To Resolution- in order to comply with your SLAs – service level agreements – and keep your group at the top of its game. You want to ensure ITIL (information technology infrastructure library) and ITSM (information technology service management) best practices are followed for you to manage incidents effectively. Even in the best scenario however, failures are still part of the game. Reality dictates that you need to have a plan to receive alerts through your incident management tools to inform you that an event has occurred. Following the alert, you would be able to quickly deploy your team […] Read more »

The secret to blameless post mortems

How your engineering teams can move past finger-pointing to effectively managing mistakes Sidney Dekker’s theory on ‘bad apples’ holds that complex systems think they would be fine if it were not for the erratic behavior of some unreliable people. According to this theory, when unexpected events are seen in an otherwise safe system, they are typically and conveniently assigned to “human error” and when they are severe to “operator carelessness”.. Similarly, post mortems often look to define and parcel out blame to engineers. Yet it begs the question of how effective the post mortems are if their only purpose is to assign blame. Instead, effective post mortems needs to “acknowledge the human tendency to blame, to allow for a productive form of its expression, and constantly refocus the postmortem’s attention past it.” Post mortems vs retrospectives The problem with post mortems begins with its name “post mortem”, which if you ask […] Read more »

Feel the burnout

Everyone on your team is feeling the pain.

Eleven practical ways for DevOps engineers to better manage their work environment At last month’s DevOpsDays Boston, many hallway sessions and Open Spaces discussions were devoted to talking about engineer burnout. At OnPage, we are focusing on this important topic through numerous formats in addition to this blog such as our e-book and video. We realize that the seriousness of the issue is highlighted in the following components: Decreased employee happiness. Employees become less satisfied and content with their work Decreased productivity. Because employees are fatigued, they are less productive Frequent job shifts. Throughout the industry, it has become standard for engineers to switch jobs every 2 to 3 years in hopes of finding employment that won’t burn them out. How to recognize burnout How do you realize that you are suffering from burnout? It’s like the famous description of a frog in boiling water. The frog only knows he’s going […] Read more »

Bringing Dev and Ops together with on-call groups

Make Dev and Ops better together by building empathy with on-call groups   Create Effective Schedules Much has been written on the tension that often exists between Dev and Ops teams in an organization. All too frequently, Devs are focused on rapid prototyping and creating code while Ops are focused on keeping the ship stable and making as few changes as possible. When I was at the DevOps Boston Conference last week, much of the “hallway conference” was devoted to conversations on how to build empathy between these frenemies and make them exist in less opposition to one another. How can Dev and Ops become less siloed? How can management encourage cross pollination? One important psychological realization was that in order to create empathy between these two groups and ensure an effective group dynamic, the teams need to spend more time living in one another’s shoes. One strong and significant step that can […] Read more »

7 Ways DevOps Can Avoid Alert Fatigue

Being on-call doesn’t have to mean you’re always tired The introduction of monitoring into the DevOps world means alerts will occur 24/7 and that there will be alert fatigue in DevOps. Monitoring needs alerts in order to be effective but the issue is that while our technology is 24/7, humans cannot work in a similar fashion. Even if engineers do attempt to push at the margins and be on-call longer and later, there are considerable health, psychological and work-related effects. Even with on-call schedules, burnout is inevitable. There are also significant financial implications for companies if they have stressed, unhappy and sleep deprived engineers. For example, engineers who are feeling the stress of alert fatigue are likely to leave for greener pastures, leaving their employers without their knowledge reservoir and needing to rehire which can cost as much as 30% of the individual’s salary. Clearly, 24/7 alerts need to be better calibrated […] Read more »