How well do you know ChatOps? Is your ChatOps knowledge world-class?

Are you a ChatOps blackbelt? OnPage is cloud-based incident alerting and management platform that elevates notifications on your smartphone so they continue to alert until read. Incidents can be programmed to arrive to the person on-call and can be escalated if they are not attended to promptly. Schedule a demonstration today! Please follow and like us: Read more »

ChatOps – Secret to great incident management in DevOps teams of 20 or 2000

Chat your way to excellence DevOps is constantly trying to improve production through automation, collaboration and tools. ChatOps is often the paradigm which brings these tasks together into a single conversation. In ChatOps, “chat applications and tools for real-time communication and task execution [are distributed] among members of development and IT operations teams”.   Yet often times the proponents of ChatOps don’t pay sufficient attention to the incident management component part of the operation, preferring instead to look at the bots and chat room tools. However, as James Fryman noted in his talk on ChatOps: Technology and Philosophy at Geekdom San Francisco “the shared context [of chat rooms] allows everybody to see and collaborate around things that happen. This is super amazing [for] the incident management space.” Specifically, though, when high priority or critical alerts occur, notifications need to be used to broadcast the incident beyond the chat room and ensure the conversations […] Read more »

How well do you know DevOps? Is your DevOps knowledge world-class?

Are you a DevOps blackbelt? OnPage is cloud-based incident alerting and management platform that elevates notifications on your smartphone so they continue to alert until read. Incidents can be programmed to arrive to the person on-call and can be escalated if they are not attended to promptly. Schedule a demonstration today! Please follow and like us: Read more »

OnPage ensures compatibility with Android 7.0

Maintaining its customer and technology focus OnPage, the leader in incident alerting and management for IT, healthcare and IOT, released the updated version of its application to ensure compatibility with Google’s Android 7.0 N operating system. Affectionately known as Nougat, Android’s newest operating system was only around for a few weeks before the OnPage update was made. Support engineer Alex Berkson indicated that creating the update was important for ensuring that OnPage users using OS N can receive critical notifications without interruptions. The OnPage update continues to support the capabilities the app has historically demonstrated such as: Secure alerting: All messaging is encrypted Persistent messaging for up to 8 hours: Also known as alert until read, alerts will continue for up to 8 hours before expiring Split Screen on Samsung devices Real time communication: Instantaneous messaging with individuals or team members Forwarding: Send messages to your team members to bring […] Read more »

The Secret to Making Your DevOps Team World Class

Continuous deployment is key to world class DevOps With their State of DevOps report released at the beginning of the summer, Puppet clearly defined the characteristics of world class DevOps organizations and the make-up of those lagging behind. According to Nigel Kersten, CIO of Puppet, there is a huge gap between organizations that get DevOps and are able to ship software on demand and “organizations that take days, weeks or even years to ship simple upgrades … and the gap is widening”. Where is your company on the spectrum? Is your company deploying 80 times per day like Etsy or thousands of times per day like Amazon? Is your company one of those that spends 50% less time remediating security issues than low performers, and 22% less time on unplanned work? How much time does your team have for building new code? Perhaps you don’t even know the exact answer […] Read more »

OnPage app helps prevent HIPAA violations and ransomware attacks

OnPage helps thwarts HIPAA violations Avoiding HIPAA violations and combating potential ransomware attacks are top of mind for many healthcare institutions. In theory, avoiding HIPAA violations should be straight forward as HIPAA requirements are very specific in what they do and don’t allow hospitals to do during communications with their staff physicians. HIPAA legislation clearly prohibits the use of devices that do not have: user authentication data encryption remote wipe capabilities delivery and read receipts date and time stamps customized messaging retention time frames specified contact lists And yet in spite of these specific requirements meant to help ensure patient privacy, hospitals are the biggest target of ransomware attacks. According to one article, 88 percent of all ransomware attacks target hospitals. Hospitals become vulnerable to ransomware because events as basic as a lost or stolen iPhone expose strategic information. The issue is best summarized by the following paragraph in a recent article from Becker Hospital […] Read more »

MTTR by the Numbers

Ignoring MTTR can cost you millions   Peter Drucker was famous for saying that “If you can’t measure it, you can’t improve it.” That’s why knowing your MTTR (mean time to resolution) is so important. If you don’t know how long it takes for you to fix issues, you cannot improve on that time. Effectively, MTTR is an important shorthand for your team to know how well they are doing in acknowledging and responding to software, hardware or deployment failures. Here are 3 key ways your team can improve their MTTR: Identifying the root cause is usually the biggest cause of MTTR variability and the one that has the highest cost associated with it. Ensure that information goes to the right person.  When a monitoring system detects an issue and sends an email, use OnPage to make sure that the correct engineer is alerted. Have escalations enabled so that engineers can […] Read more »

What you need to know about MTTR and why IT MaTTeRs

What all engineering teams should know about MTTR In the IT world, performance is everything. So when technology fails, your first thought is how to utilize incident management knowledge to repair the situation and minimize downtime. As both a manager and an engineer, you need to minimize your MTTR –Mean Time To Resolution- in order to comply with your SLAs – service level agreements – and keep your group at the top of its game. You want to ensure ITIL (information technology infrastructure library) and ITSM (information technology service management) best practices are followed for you to manage incidents effectively. Even in the best scenario however, failures are still part of the game. Reality dictates that you need to have a plan to receive alerts through your incident management tools to inform you that an event has occurred. Following the alert, you would be able to quickly deploy your team […] Read more »

The secret to blameless post mortems

How your engineering teams can move past finger-pointing to effectively managing mistakes Sidney Dekker’s theory on ‘bad apples’ holds that complex systems think they would be fine if it were not for the erratic behavior of some unreliable people. According to this theory, when unexpected events are seen in an otherwise safe system, they are typically and conveniently assigned to “human error” and when they are severe to “operator carelessness”.. Similarly, post mortems often look to define and parcel out blame to engineers. Yet it begs the question of how effective the post mortems are if their only purpose is to assign blame. Instead, effective post mortems needs to “acknowledge the human tendency to blame, to allow for a productive form of its expression, and constantly refocus the postmortem’s attention past it.” Post mortems vs retrospectives The problem with post mortems begins with its name “post mortem”, which if you ask […] Read more »

OnPage uses Tropo to create bilingual office

Spanish-language phone tree facilitates after-hours at doctor’s office OnPage and Cisco Tropo Back in February of this year, OnPage announced its integration with Cisco’s Tropo technology.  Tropo’s cloud-based API platform enables OnPage to embed real-time communications within the OnPage critical alerting application through the use of voice commands.  Tropo makes it simple for OnPage’s developers to use the Cisco capabilities to create new experiences. CEO Judit Sharon knew that this was a great ‘a-ha’ moment as it would allow many unexpected and serendipitous integrations. But what Judit did not realize was quite the way in which that serendipity would unfold. What we’ve got here is a failure to communicate At the beginning of the summer, OnPage received a call from the offices of Dr. Hector Lopez in Milwaukee, Wisconsin. Eric Eickhorst, the office manager at Milwaukee Family Practice, called OnPage as he was trying to find a way to help […] Read more »

Feel the burnout

Eleven practical ways for DevOps engineers to better manage their work environment At last month’s DevOpsDays Boston, many hallway sessions and Open Spaces discussions were devoted to talking about engineer burnout. The artifacts of engineer burnout are many but the main ones identified by the conference attendees were: Decreased employee happiness. Employees become less satisfied and content with their work Decreased productivity. Because employees are fatigued, they are less productive Frequent job shifts. Throughout the industry, it has become standard for engineers to switch jobs every 2 to 3 years in hopes of finding employment that won’t burn them out. How to recognize burnout How do you realize that you are suffering from burnout? It’s like the famous description of a frog in boiling water. The frog only knows he’s going to die when it’s too late. Similarly, the engineer only knows they are suffering burnout when they have either […] Read more »

Bringing Dev and Ops together with on-call groups

Make Dev and Ops better together by building empathy with on-call groups   Create Effective Schedules Much has been written on the tension that often exists between Dev and Ops teams in an organization. All too frequently, Devs are focused on rapid prototyping and creating code while Ops are focused on keeping the ship stable and making as few changes as possible. When I was at the DevOps Boston Conference last week, much of the “hallway conference” was devoted to conversations on how to build empathy between these frenemies and make them exist in less opposition to one another. How can Dev and Ops become less siloed? How can management encourage cross pollination? One important psychological realization was that in order to create empathy between these two groups and ensure an effective group dynamic, the teams need to spend more time living in one another’s shoes. One strong and significant step that can […] Read more »

Why Pagers Suck!

HIPAA Violations are only the beginning If you’re a healthcare professional you probably use pagers to communicate with your office and with others in your practice. But did you know that using a pager could cost your office $650K for a HIPAA violation? That seems like a lot of money to spend for the liberty of using a $5 device. HIPAA Requirements According to the HIPAA regulations, healthcare organizations using pagers must Ensure that all communications are encrypted Ensure that a system of message accountability is implemented Enable remote removal of messages from a pager to protect the integrity of PHI in the event of a pager being lost or stolen Enable a process for user identification on each device Enable an automatic log-out facility to prevent unauthorized access to PHI when a pager is left unattended Unless you have borrowed a pager from the future, your pager doesn’t meet […] Read more »

7 Ways DevOps Can Avoid Alert Fatigue

Being on-call doesn’t have to mean you’re always tired The introduction of monitoring into the DevOps world means alerts will occur 24/7 and that there will be alert fatigue in DevOps. Monitoring needs alerts in order to be effective but the issue is that while our technology is 24/7, humans cannot work in a similar fashion. Even if engineers do attempt to push at the margins and be on-call longer and later, there are considerable health, psychological and work-related effects. Even with on-call schedules, burnout is inevitable. There are also significant financial implications for companies if they have stressed, unhappy and sleep deprived engineers. For example, engineers who are feeling the stress of alert fatigue are likely to leave for greener pastures, leaving their employers without their knowledge reservoir and needing to rehire which can cost as much as 30% of the individual’s salary. Clearly, 24/7 alerts need to be better calibrated […] Read more »

Critical Alerts Without a Data Package

When the Wi-Fi magical garden is gone and the message must get through Most of us don’t even think twice about internet connectivity. We get it at work or Starbucks or at home. It’s one of those low-cost goods in our society which can be produced in vast quantities as needed with minimal effort. And while $30 per month might seem pretty close to free, many consumers and business individuals outside the United States don’t live in our magical wi-fi garden. Got tech support in India? So, imagine instead that you are a company with tech support in India where nearly 900 million people do not have access to the internet. Or perhaps you have support in the Philippines which has the third lowest average connection speed in the region. Further, imagine that you have a critical alert you need to get out to your engineers about a downed server […] Read more »

No Data Plan? No Problem. OnPage’s Page to Phone Feature Ensures You’re Never Out of Touch

Maintain Your On-Call Schedule No Matter What Today, OnPage announced the release of its Page to Phone feature. This feature provides customers and IT management with phone redundancy to existing prevalent alert.  The OnPage alert already provides continuous and prominent notification for up to 8 hours until the message is read, allowing critical IT messages to get to the right person. With Page to Phone, OnPage provides customers who don’t have access to the data channel an additional way to get notified. The feature allows customers to receive a phone call that they can acknowledge via smartphone and mark as ‘read’ in the audit trail. How Page to Phone Works The Page to Phone redundancy feature is easily initiated and enabled with a checkbox in the OnPage enterprise console, where you can also choose Email and SMS redundancy messaging.  In the interface, you provide the OnPage ID, name, phone number […] Read more »

See You at Boston DevOps Days 2016

OnPage is proud to be a Gold Sponsor at DevOps Days Boston taking place at the Boston Park Plaza on August 25-26th.  If you’re attending, please stop by and get one of our very cool t-shirts and enter to win one of the many prizes we’re raffling off. If you want to schedule a time to chat or see some of our newest features, please sign up for a meeting with us. We’d love to speak with you. Uniqueness of Boston DevOps Days – where to start When Patrick Debois and Andrew Shafer coined the term DevOps back in 2009, they probably did not expect their concept for bringing together agile development and code deployment to become a worldwide phenomenon. However, today cities around the world host DevOps conferences and each conference has its own flavor. Boston’s attendees are not surprisingly from the high tech and biotech communities. Attendees from […] Read more »

Cygnus Decreases Costs and Improves Response Time

OnPage facilitates Cygnus’ growth of 25% per year through efficiency, cost reduction and improved response to critical alerts OnPage, a leading provider of incident management technology, has helped Cygnus Systems Inc., reduce the time it takes to respond to critical alerts. In a case study released today, OnPage details how through Cygnus System’s integration of OnPage’s technology and ConnectWise, Cygnus has been able to spend less on its answering service, reduce waiting time for its customers and improve the quality of response by its engineers. The sum of these efforts has allowed the company to grow by 25% over the past year. Slashing response time to critical alerts and getting IT to work as quickly as possible to resolve incidents was key for Cygnus. Inefficient issue resolution, limited visibility of issues and fragmented communications can plague MSPs if not handled efficiently and quickly. Cygnus had to assure its clients of end-to-end service […] Read more »

50 Ways to leave your pager: Why it’s time to think pager replacement.

Why is it time to think about pager replacement? Perhaps the need for finding a pager alternative is clear to many realize the inefficency and time waste caused by using the device. These facts are highlighted in an article from 2014 entitled Why Are Hospitals Still Using Pagers?, Cliff McClintick writes “If you read the marketing copy of nearly any large hospital or health system, you’ll see references to “state-of-the art equipment” and “cutting-edge technology…. [C]onsumers have a right to expect the claim to be true. But if they or their family members experience delays in care because a doctor hasn’t returned a page, their faith may be shaken.” Moreover, the beep on a pager doesn’t rise above the constant alarms and sounds of other machines in hospitals or data centers. And often pager holders can sleep through their alarms. At OnPage, we have thought long and hard about how […] Read more »

Why proper incident management is key to proper IT management

Proper IT management requires proper incident management. Otherwise, you court Murphy’s law at your peril. In the IT world, if a server can fail, a cache overload or traffic overload the network – it will. And the consequences are significant. Many IT organizations face database, hardware, and software downtime, lasting short periods to shutting down the business for days. According to a January 2016 article in Network Computing on the high price of IT downtime, organizations face: “an average of five downtime events each month, with each downtime event being expensive indeed: from $1 million a year for a typical midsize company to more than $60 million for a large enterprise.” The major cause of this downtime is equipment failures which account for almost 40% of downtime. The second most frequent cause of downtime is human error which accounts for 25% of downtime. Cybersecurity accounts for only about 10% of […] Read more »