Skip to main content

22 posts tagged with "monitoring"

IncidentHub posts related to monitoring

View All Tags

How to Monitor SaaS Status in 2026 : A Complete Guide

· 29 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

This is an updated and expanded version of the older guide.

According to the 2025 State of SaaS report, organizations use an average of 106 SaaS apps.

Staying on top of your SaaS vendors' status is as important as monitoring your own services. The Cloudflare, AWS, Azure, and Google Cloud outages in 2025 were strong reminders of this fact. As a result of these outages, many SaaS services that depend on them experienced outages as well, leading to a cascading effect which took down hundreds of vendors and affected thousands of users. Cloud and SaaS outages are not isolated any longer but usually happen in groups due to the dependency chain.

This article aims to be a comprehensive guide on how to monitor the uptime status of your SaaS and Cloud vendors.

Monitoring SaaS Status in 2026 - A Complete Guide

The 2025 Guide to Open Source Status Page Software

· 24 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

This is an updated version of the 2024 article.

Maintaining transparent communication about service availability is crucial for businesses of all sizes. Status pages are an important part of your communication strategy during times of outages and maintenance events.

You can choose to go with a fully managed status page provider or host an open-source one yourself.

Open source status page software offer a cost-effective and customizable solution where you have complete control over the code, data, and presentation. This guide explores the best available open source status page software in 2025 to help you choose the right tool for your needs.

Public Status Page Example

Improving the Developer Experience by Monitoring Third-Party Outages

· 9 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

The role of third-party SaaS and cloud services in the modern software development stack needs no explanation. Primarily due to the ease of setting and hooking them up together, they make the software development lifecycle (SDLC) much easier than it was 10 years ago. No more managing the overhead of installing, configuring, maintaining, backing up, and scaling of source code repos, virtual machines, and CI/CD systems. Some SaaS services don't have any in-house options, e.g. payment gateways, so you have to use them.

This dependency on third-party services also brings risks. The more such services in the chain, the more likely it is that a failure in one of them will impact or even cripple your smoothly running development and deployment pipeline. These failures by extension will also impact your business and customers.

You have vetted and chosen reliable services. However, outages happen. The best you can do is to prepare for them and know when they occur. This article is about the knowing part.

Frustrated developer

January 2025 Product Update - Easier Onboarding, Better User Experience, and Reliability Improvements

· 4 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

For the last two months, we have focused on improving the onboarding experience for users so that they can get started with monitoring with minimal effort. We have also added several improvements in the backend to make the service more robust and reliable. Some of the usability improvements are driven by user feedback. Others incorporate what we would personally like to see in such a monitoring service. We have also improved the dashboard user experience.

IncidentHub Dashboard

Adding a Grafana Dashboard to Your Prometheus Setup

· 6 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

This article is part of a series on setting up an end-to-end monitoring and alerting stack using Prometheus.

Continuing our series on setting Prometheus in a Docker container, we will add a Grafana instance to our Prometheus setup.

Please refer to the previous article where we use docker compose to run Prometheus and Alertmanager together as that forms the basis to run multiple related containers. We will add a container to run Grafana to the same compose file in this article.

Grafana Dashboard

How To Decide Between Hosting Your Own Status Page Versus Using a Managed One

· 6 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

A status page forms a key part of your incident communication strategy. When it comes to setting up a status page, you have two options:

  • Host your own - using either an open source project or a custom solution.
  • Use a managed status page provider.

We will examine the pros and cons of each option along these dimensions:

  1. Feature Set
  2. Service Related

For 1, if you choose a self-managed, open-source or custom solution, it's in your control. For a managed solution, you are limited by the provider's feature set.

For 2, if you choose a self-managed solution, your team is responsible for the quality of the service. For a managed solution, you are dependent on the provider's service quality.

In most cases, you are better off using a managed solution from a reputed provider, unless you have:

  • Specific requirements that are not met by the vendor.
  • Budget constraints.

Monitoring Security Vulnerabilities in Your Cloud Vendors

· 7 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

If you manage applications running on cloud platforms, you likely depend on multiple cloud vendors and services. These could be infrastructure providers like AWS, GCP or Azure. A vulnerability in any of these services could potentially impact your applications and your users.

A cloud platform has many moving parts, many of which are dependent on other third-party providers. For example:

  • Operating system images for VMs which are maintained by third-party vendors.
  • Container images which are hosted on external repositories.
  • Software stacks which are maintained by other vendors but available for deployment on the cloud provider.
  • Libraries used by the cloud provider's internal software which are maintained by other developers or organizations.
  • Control plane software like Kubernetes.
  • Hardware, like processors, which are provided by the manufacturer.
  • Hypervisors which are developed and maintained by third-party vendors.
  • Networking hardware manufactured by other vendors.

Sending Alerts Using Prometheus and Alertmanager

· 10 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

This article is part of a series on setting up an end-to-end monitoring and alerting stack using Prometheus.

Continuing our series on setting up Prometheus in a container, this article provides a step-by-step guide for how to configure alerts in Prometheus. We will add alerting rules and deploy Prometheus Alertmanager with Slack integration.

If you follow the steps in this article, you will end up with a containerized setup for:

  1. A Prometheus instance with alerting rules.
  2. An Alertmanager instance which can send alerts originating from those rules to a Slack channel.

Let's get started.

Prometheus alerts

Deploying Prometheus With Docker

· 6 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

This article is part of a series on setting up an end-to-end monitoring and alerting stack using Prometheus.

There are different ways you can use to deploy the Prometheus monitoring tool in your environment. One of the fastest ways to get started is to deploy it as a Docker container. This guide shows you how to quickly set up a minimal Prometheus on your laptop. You can then extend that setup to add a monitoring dashboard, alerting, and authentication.

Deploying Prometheus with Docker

How to Configure a Remote Data Store for Prometheus

· 10 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

This article is part of a series on setting up an end-to-end monitoring and alerting stack using Prometheus.

The Prometheus monitoring tool can store its metrics either locally or remotely. You can configure a remote data store using the remote_write configuration. This article describes the various data store options available as well as how to set up a remote store.

Overview of Remote Storage

By default, Prometheus stores data locally wherever it is installed. The data directory can be configured by using the --storage.tsdb.path command line option when starting Prometheus. In practice you can use a separate disk for higher performance attached to the machine where Prometheus is running.

However, this may not be possible or optimal in all situations as you might want a data store that is more suited for time series data, and has larger storage capabilities for higher data retention. Prometheus would usually run in a standalone VM or a Kubernetes pod or a Docker container, and it would not have access to such data stores by default.

A remote store can add these capabilities to Prometheus. The remote storage option can be set by using the remote_write key in the Prometheus configuration YAML file.

A Beginner's Guide To Service Discovery in Prometheus

· 13 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

This article is part of a series on setting up an end-to-end monitoring and alerting stack using Prometheus.

Service discovery (SD) is a mechanism by which the Prometheus monitoring tool can discover monitorable targets automatically. Instead of listing down each and every target to be scraped in the Prometheus configuration, service discovery acts as a source of targets that Prometheus can query at runtime.

Service discovery becomes crucial when there are dynamically changing hosts, especially in microservices architectures and environments like Kubernetes. In Prometheus parlance, service discovery is a way of discovering "scrape targets".

Prometheus logo

For example, pods are created dynamically in Kubernetes as a result of new services being deployed and undeployed, autoscaling events, and errors causing pods to crash and go away. If you are using Prometheus for scraping pods in such an environment, Prometheus has to know which pods are running and scrapable at any given point in time. The Kubernetes service discovery pluging enables this. Similarly, there are SD plugins for other common environments.

You can use service discovery in Prometheus with the predefined plugins, or write your own custom ones using file or HTTP, depending on the situation.

The 2024 Guide to Open Source Status Page Providers

· 7 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

There is a newer version of this article for 2025.

Maintaining transparent communication about service availability is crucial for businesses of all sizes. Status pages are an important part of your communication strategy during times of outages and maintenance events.

You can choose to go with a fully managed status page provider, or host an open-source one yourself.

Open source status page providers offer a cost-effective and customizable solution. However, they can come with their own drawbacks. This guide explores open source status page providers in 2024 to help you choose the right tool for your needs.

Best Practices for Choosing a Status Page Provider

· 7 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

Last updated on August 11, 2025

Downtime is inevitable but what sets successful businesses apart is how they handle it. A key part of incident management is incident communication with both internal and external stakeholders. A status page is a crucial tool for maintaining clear communication with users during outages or service interruptions. There are numerous status page providers available with different features. This article will guide you through best practices for selecting a provider that suits your needs.

GitHub status page

A Guide to Monitoring Multiple Status Pages

· 15 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

A newer and expanded version of this article is available here.

Last updated on August 8, 2025.

Incident updates on the public status pages of your SaaS vendors and cloud providers are often the first indication that they might have an outage. Providers also post updates about upcoming and ongoing maintenance on their status pages. Monitoring your SaaS and cloud status pages to detect downtime becomes crucial to your business operations. This article will guide you through the process of effectively monitoring such status pages.

There are two ways to monitor multiple status pages:

  • The manual process.
  • Using a status page aggregator like IncidentHub.

If you are using the second option, which is the recommended approach, you can skip directly to the section on Use a Status Page Aggregator Tool.

In either case you will need to identify your cloud providers and locate their public status pages first.

Monitor multiple status pages

A Step by Step Guide to Checking if a SaaS is Down

· 9 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

Last updated on August 8, 2025.

Modern businesses depend heavily on Software as a Service (SaaS). SaaS is not limited to being used by software development teams. Almost all aspects of business operations - accounting, HR, payroll, marketing, IT, sales, support - depend on one or more SaaS applications. Given this dependency on SaaS applications, their uptime becomes tightly tied to a business's uptime. Any SaaS downtime can affect both a business's daily operations as well as the user experience. Measuring the uptime of SaaS providers is a critical part of your incident management process and business continuity plan.

How to check if a SaaS is experiencing downtime? Follow the steps below: