Site Reliability Engineer’s Guide to Black Friday It’s gotten to the point where Black Friday reliability prep has to start on…well Black Friday. This year, 32% of consumers in the US claimed that they were going to start their holiday shopping in July-October. Plus, Black Friday isn’t the only day eCommerce businesses have to worry … Continued
How Effective Are Your Alerting Rules? Recently, I came across this Reddit post highlighting the challenges of having ineffective alerting rules: And, here at OnPage we have experience with various companies who have dealt with just that, so I felt I should share some of our top tips for creating effective alerting rules in this … Continued
What Are Large Language Models? Large language models are algorithms designed to understand, generate, and manipulate human language. State-of-the-art large language models include OpenAI’s GPT-4o, Anthropic Claude Sonnet 3.5, and Meta LLaMA 3.1. They are built using neural networks with billions or even trillions of parameters. They are trained on vast datasets that can include … Continued
When it comes to critical incident management, IT teams require a structured approach that will ensure that any cybersecurity event is swiftly remediated. And no incident management plan is complete without a clearly defined incident response team. Whether your team is looking to establish an incident response team from scratch or just improve existing response … Continued
Rethinking IT Management – Introduction We live in a time where immediate communication of critical incidents is vital for maintaining continuous service availability. As companies strive to enhance their IT service management practices, many integrate technologies like Interactive Voice Response (IVR) into their service delivery frameworks. However, this approach may not always be the most … Continued
Crisis Management for Oil and Gas Companies Oil and gas companies operate in a high-stakes environment where the potential for catastrophic incidents, such as oil spills, explosions, and natural disasters always exists. These risks necessitate the establishment of robust crisis management for oil and gas companies to ensure the safety of their personnel and minimize … Continued
What Is Software Deployment? Software deployment, a critical process in software development, refers to all the activities that make a software system available for use. It’s the stage where all the hard work of creating software culminates into something tangible that users can interact with. But before we delve into its complexities, let’s first understand … Continued
What do software engineers do during on-call? Most software engineers know that they are typically tasked with on-call shifts, but new software engineers entering the field may be asking themselves – What do I even do if I get scheduled for an on-call shift? This is a common question that often doesn’t get answered until … Continued
What is LLM Large Language Models (LLMs) are advanced artificial intelligence models designed to comprehend and generate human-like language. With millions or even billions of parameters, these models, like GPT-3, excel in natural language processing, understanding context, and generating coherent and contextually relevant text across various applications. What is LLM Observability and Monitoring? LLM observability … Continued
What Are AWS Databases? Amazon Web Services (AWS) provides a range of managed database services that provide multiple database technologies to handle various use cases. They are designed to free businesses from tasks like database administration, maintenance, upgrades, and backup. AWS databases come in several types to cater to different business needs. These include relational … Continued