Skip to main content

Sending Alerts Using Prometheus and Alertmanager

· 9 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Continuing our series on setting up Prometheus in a container, this article provides a step-by-step guide for how to configure alerts in Prometheus. We will add alerting rules and deploy Prometheus Alertmanager with Slack integration.

If you follow the steps in this article, you will end up with a containerized setup for:

  1. A Prometheus instance with alerting rules.
  2. An Alertmanager instance which can send alerts originating from those rules to a Slack channel.

Let's get started.

Prometheus alerts

Using Docker Compose

Moving the Prometheus Container to Docker Compose

In a previous article, we looked at how to setup Prometheus in a Docker container. We will now deploy Prometheus Alertmanager in a different container. To manage multiple containers with a single command, we will move our entire deployment to Docker compose.

In the prometheus directory (refer to the previous article), create a file called docker-compose.yml and add the following content to it:

volumes:
prometheus:
external: true

services:
prometheus:
image: prom/prometheus:v3.0.0
volumes:
- ./config:/etc/prometheus
- prometheus:/prometheus
ports:
- 9000:9090
restart: always

This will keep the previous configuration file as well as the docker volume. To run this we need to have Docker compose installed. We won't go into the details of installing Docker compose as it's beyond the scope of this article. Note that we used external: true so that we can reuse the previous volume that we created.

Bring up the container by executing

docker compose up --detach

Access the Prometheus UI by visiting http://localhost:9090 as before. The --detach flag will run the containers in the background. To stop the container(s), execute

docker compose down

You can tail the logs of the running containers by using

docker compose logs --follow

Adding Alerting Rules to the Prometheus container

Prometheus can process alert rules written in PromQL. Let's create a basic rule file that will alert if the cpu usage exceeds 20% and 80% for 1 minute, with different severity levels. In the config directory, create a file called alert_rules.yml with the following content:

groups:
- name: default
labels:
tier: infra
rules:
- alert: ModeratelyHighCPU
expr: rate(process_cpu_seconds_total[1m]) > 0.2
for: 1m
keep_firing_for: 1m
labels:
severity: warning
annotations:
summary: Moderately High CPU
- alert: VeryHighCPU
expr: rate(process_cpu_seconds_total[1m]) > 0.8
for: 1m
keep_firing_for: 1m
labels:
severity: critical
annotations:
summary: Very High CPU

This defines two alerts on a metric process_cpu_seconds_total that fire based on whether the total CPU usage exceeds 20% or 80% for 1 minute.

The thresholds are just for illustration.

Note that we have assigned labels with different values for the severity key to the alerts. We will see how they are used to route alerts later in the Alertmanager configuration.

Now add this rule file's name to the prometheus.yml file so that the latter looks like this:

global:
scrape_interval: 15s
evaluation_interval: 15s

alerting:
alertmanagers:
- static_configs:
- targets:

rule_files:
- "alert_rules.yml"

scrape_configs:
- job_name: "prometheus"

static_configs:
- targets: ["localhost:9090"]

The path to the rules file is relative to the location of the prometheus.yml. You can also give an absolute path if it's somewhere else. Now restart the container and you should be able to see the rule in your UI at http://localhost:9000/rules

docker compose down
docker compose up --detach

This alert won't fire until it exceeds the threshold, and that will depend on how busy your machine is. So if you want to make sure it works you can change the threshold to a very low number like 0.0002 and test it. You can see the alert in http://localhost:9000/alerts.

Now that we have a rule engine that can generate alerts, we need a way to send these alerts to an endpoint like Slack or Email or PagerDuty to receive notifications. To do this, we will use Prometheus Alertmanager.

Setting up Prometheus Alertmanager

Alertmanager is a Prometheus project that acts as a gateway between your Prometheus rule engine and external tools that can process alerts. You can configure routing rules as well as silence alerts in Alertmanager. At the end of this section you will have your Prometheus container sending alerts to your Alertmanager container.

Setting up the Alertmanager Container

First, to keep things consistent, let us create a volume for alertmanager's data:

docker volume create alertmanager

Alertmanager will use this volume to store alert silences among other things. So if you silence an alert, and your Alertmanager container restarts, the silence will still be active.

Next, create a directory called alertmanager-config at the same level as prometheus. This will be our Alertmanager configuration directory. Inside it, create a basic alertmanager config in a file alertmanager.yml:

global:

# The directory from which notification templates are read.
# We won't be using this now
templates:
- '/etc/alertmanager/template/*.tmpl'

route:
group_by: ['service']

group_wait: 30s

group_interval: 5m

repeat_interval: 3h

receiver: team-slack

inhibit_rules:
- source_matchers: [severity="critical"]
target_matchers: [severity="warning"]

receivers:
- name: 'team-slack'
slack_configs:
- api_url: '<webhook_url>'
channel: '#<channel_name>'
send_resolved: true

Replace the channel_name with the Slack channel where you want to receive alerts, and the webhook_url with a Slack webhook URL. Be careful not to commit this file with the webhook URL in it.

This configuration just routes all alerts into a Slack channel irrespective of their severity or other labels. In a later article we will see how to route alerts to different teams and channels based on labels.

Now add the Alertmanager configuration to the docker-compose.yml so that it becomes part of our Prometheus setup:

volumes:
prometheus:
external: true
alertmanager:
external: true

services:
prometheus:
image: prom/prometheus:v3.0.0
volumes:
- ./config:/etc/prometheus
- prometheus:/prometheus
ports:
- 9000:9090
restart: always
links:
- alertmanager:alertmanager

alertmanager:
image: prom/alertmanager:v0.27.0
ports:
- 9001:9093
volumes:
- ./alertmanager-config/:/etc/alertmanager/
- alertmanager:/alertmanager
restart: always

We've added a link to the prometheus container so that it can find the alertmanager container.

Now there's one more piece to the puzzle. Prometheus should be configured so that it knows the Alertmanager endpoint to send to. Think of the link in the docker-compose.yml as an internal DNS name that is reachable by the Prometheus container. Modify the prometheus.yml file to add this under the alertmanagers key

global:
scrape_interval: 15s
evaluation_interval: 15s

alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093

rule_files:
- "alert_rules.yml"

scrape_configs:
- job_name: "prometheus"

static_configs:
- targets: ["localhost:9090"]

We are referring to Alertmanager using the internal name and its default port. If you change the link name in docker-compose.yaml you have to change it here too.

Once you restart your containers, you should be able to access the Alertmanager UI at http://localhost:9001.

To test this end-to-end, modify your alert_rules.yml to lower the threshold. Within 1 minute (the value of "for" in the alert_rules.yml) you should see the alert message in your Slack channel.

Congratulations! You have just setup a minimal end-to-end alerting pipeline with Prometheus and Alertmanager. From here, we can do many more things:

  • Setup advanced routing rules for Alertmanager, including routing to different teams on different channels, routing a subset of one team's alerts to another, and routing alerts of a certain severity to a different endpoint like PagerDuty.
  • Add basic authentication and Google SSO to our Prometheus UI.
  • Configure our own custom email templates for Alertmanager emails.
  • Store secrets securely.

Stay tuned for updates on these topics.

Troubleshooting

If you don't see any alerts in your Slack channel, follow this step-by-step guide to troubleshoot:

  • Check all the config files including the docker-compose.yml and verify that they match what we have been walking through. You might want to run a YAML linter on all your files.
  • Check the container logs for any errors. Run docker compose logs --follow
  • Are all your containers running? Run docker ps to check.
  • Is the alert firing in Prometheus? Check the Prometheus UI at http://localhost:9000/alerts.
  • If yes, is the alert reaching Alertmanager? Check the Alertmanager UI at http://localhost:9001/#/alerts.
  • If yes, is your Slack webhook URL correct? Is the channel correct? Note that the channel name should have a leading '#'.

If all of these things are correct, turn on debug for both Prometheus and Alertmanager in docker-compose.yml:

volumes:
prometheus:
external: true
alertmanager:
external: true

services:
prometheus:
image: prom/prometheus:v3.0.0
volumes:
- ./config:/etc/prometheus
- prometheus:/prometheus
ports:
- 9000:9090
restart: always
links:
- alertmanager:alertmanager
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.path=/prometheus'
- '--log.level=debug'

alertmanager:
image: prom/alertmanager:v0.27.0
ports:
- 9001:9093
volumes:
- ./alertmanager-config/:/etc/alertmanager/
- alertmanager:/alertmanager
restart: always
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
- '--log.level=debug'

This should result in more verbose output which should help you debug. If you are still stuck, reach out to me on Twitter and I'll try my best to help.

Useful Tip

If you enable debug, don't add just the log.level settings. You have to override the other two options also. This is because the Prometheus image ships with the storage and config directories set to /prometheus and /etc/alertmanager/ respectively. We mount these directories in our compose file. If you override the command option with only the log level, the container will fallback on the default values for these, which are different. The same thing applies to Alertmanager.

Conclusion

Prometheus and Alertmanager together form a powerful monitoring and alerting stack. It's easy to setup for basic cases, but can require significant work for more sophisticated use-cases. This article is a guide to a basic setup of these tools.

References

Image credits: Elisa from Pixabay