Skip to main content

How to Configure a Remote Data Store for Prometheus

· 9 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

This article is part of a series on setting up an end-to-end monitoring and alerting stack using Prometheus.

The Prometheus monitoring tool can store its metrics either locally or remotely. You can configure a remote data store using the remote_write configuration. This article describes the various data store options available as well as how to set up a remote store.

Overview of Remote Storage

By default, Prometheus stores data locally wherever it is installed. The data directory can be configured by using the --storage.tsdb.path command line option when starting Prometheus. In practice you can use a separate disk for higher performance attached to the machine where Prometheus is running.

However, this may not be possible or optimal in all situations as you might want a data store that is more suited for time series data, and has larger storage capabilities for higher data retention. Prometheus would usually run in a standalone VM or a Kubernetes pod or a Docker container, and it would not have access to such data stores by default.

A remote store can add these capabilities to Prometheus. The remote storage option can be set by using the remote_write key in the Prometheus configuration YAML file.

A Beginner's Guide To Service Discovery in Prometheus

· 11 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

This article is part of a series on setting up an end-to-end monitoring and alerting stack using Prometheus.

Service discovery (SD) is a mechanism by which the Prometheus monitoring tool can discover monitorable targets automatically. Instead of listing down each and every target to be scraped in the Prometheus configuration, service discovery acts as a source of targets that Prometheus can query at runtime.

Service discovery becomes crucial when there are dynamically changing hosts, especially in microservices architectures and environments like Kubernetes. In Prometheus parlance, service discovery is a way of discovering "scrape targets".

Prometheus logo

For example, pods are created dynamically in Kubernetes as a result of new services being deployed and undeployed, autoscaling events, and errors causing pods to crash and go away. If you are using Prometheus for scraping pods in such an environment, Prometheus has to know which pods are running and scrapable at any given point in time. The Kubernetes service discovery pluging enables this. Similarly, there are SD plugins for other common environments.

You can use service discovery in Prometheus with the predefined plugins, or write your own custom ones using file or HTTP, depending on the situation.

The Ultimate List of Incident Management Tools in 2024

· 7 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Incident management tools are important for organizations to effectively handle service outages. With so many incident management tools around with different feature sets, it's often difficult to find the one that is right for your needs. In this article, we attempt to make a list of incident management software available in 2024 with their features to help you arrive at the right one.

We have focused mostly on tools that offer incident management capabilities - which include at least incident lifecycle management, on-call scheduling, and third-party integrations.

There are many good tools which are focused only on incident response, or on monitoring and generating alerts, or on the ticketing aspect of incidents. We have not included those to avoid cluttering this article.

Incident Management Tools

The Rising Role of Slack in Incident Management

· 4 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Why is Slack becoming so popular in incident management?

Slack is one of the most popular communication tools used in companies. If you're part of a remote team, your team is probably on Slack or something similar like MS Teams. Although IM tools lack the communication nuances that are taken for granted in face to face interactions, they provide many other advantages:

  • Access to historical data
  • Asynchronous communication
  • The ability to share links and documents easily
  • Adding anybody in the organization to a conversation
Slack in incident management

Slack in Incident Management

One of the trends I've noticed in incident management is the growing rise of Slack in incident response and management tools. I think this is tied to the increase in remote work after COVID-19.

The 2024 Guide to Open Source Status Page Providers

· 7 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Maintaining transparent communication about service availability is crucial for businesses of all sizes. Status pages are an important part of your communication strategy during times of outages and maintenance events.

You can choose to go with a fully managed status page provider, or host an open-source one yourself.

Open source status page providers offer a cost-effective and customizable solution. However, then can come with their own drawbacks. This guide explores open source status page providers in 2024 to help you choose the right tool for your needs.

Best Practices for Choosing a Status Page Provider

· 6 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Downtime is inevitable but what sets successful businesses apart is how they handle it. A key part of incident management is incident communication with both internal and external stakeholders. A status page is a crucial tool for maintaining clear communication with users during outages or service interruptions. There are numerous status page providers available with different features. This article will guide you through best practices for selecting a provider that suits your needs.

GitHub status page

Integrate Incident Alerts Into Your Slack Workspace

· 3 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Updated Mar 26, 2025

Staying on top of your third-party Cloud and SaaS service outages is crucial to maintaining the reliability of your own applications. Like many modern teams, Slack might be your communication tool of choice. You can keep up with such incidents by pushing these events to a Slack channel.

IncidentHub has its own Slack app which can be used to push incident lifecycle events to the Slack channel of your choice. It can be used to send incident trigger, update, and resolve events.

Installing IncidentHub's Slack App

You must have the correct permissions on your Slack workspace to be able to do this.

Follow these steps to configure the Slack app in your Slack workspace.

How To Monitor Public Status Pages of Cloud Providers - a Step-by-Step Approach

· 8 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Incident updates on the public status pages of your cloud providers are often the first indication that they might have an outage. Providers also post updates about upcoming and ongoing maintenance on their status pages. Thus, monitoring your cloud status pages becomes crucial to your business operations. This article will guide you through the process of effectively monitoring such status pages.