Sending Alerts Using Prometheus and Alertmanager
Introduction
Continuing our series on setting up Prometheus in a container, this article provides a step-by-step guide for how to configure alerts in Prometheus. We will add alerting rules and deploy Prometheus Alertmanager with Slack integration.
If you follow the steps in this article, you will end up with a containerized setup for:
- A Prometheus instance with alerting rules.
- An Alertmanager instance which can send alerts originating from those rules to a Slack channel.
Let's get started.
- Introduction
- Using Docker Compose
- Adding Alerting Rules to the Prometheus container
- Setting up Prometheus Alertmanager
- Troubleshooting
- Conclusion
- References
Using Docker Compose
Moving the Prometheus Container to Docker Compose
In a previous article, we looked at how to setup Prometheus in a Docker container. We will now deploy Prometheus Alertmanager in a different container. To manage multiple containers with a single command, we will move our entire deployment to Docker compose.
In the prometheus directory (refer to the previous article), create a file called docker-compose.yml
and add the following content to it:
volumes:
prometheus:
external: true
services:
prometheus:
image: prom/prometheus:v3.0.0
volumes:
- ./config:/etc/prometheus
- prometheus:/prometheus
ports:
- 9000:9090
restart: always
This will keep the previous configuration file as well as the docker volume. To run this we need to have Docker compose installed. We won't go into the details of
installing Docker compose as it's beyond the scope of this article. Note that we used external: true
so that we can reuse the previous volume that we created.
Bring up the container by executing
docker compose up --detach
Access the Prometheus UI by visiting http://localhost:9090
as before. The --detach
flag will run the containers in the background. To stop the container(s), execute
docker compose down
You can tail the logs of the running containers by using
docker compose logs --follow
Adding Alerting Rules to the Prometheus container
Prometheus can process alert rules written in PromQL. Let's create a basic rule file that will alert if the cpu usage exceeds 20% and
80% for 1 minute, with different severity levels. In the config directory, create a file called alert_rules.yml
with the
following content:
groups:
- name: default
labels:
tier: infra
rules:
- alert: ModeratelyHighCPU
expr: rate(process_cpu_seconds_total[1m]) > 0.2
for: 1m
keep_firing_for: 1m
labels:
severity: warning
annotations:
summary: Moderately High CPU
- alert: VeryHighCPU
expr: rate(process_cpu_seconds_total[1m]) > 0.8
for: 1m
keep_firing_for: 1m
labels:
severity: critical
annotations:
summary: Very High CPU
This defines two alerts on a metric process_cpu_seconds_total
that fire based on whether the total CPU usage exceeds 20% or 80% for 1 minute.
The thresholds are just for illustration.
Note that we have assigned labels with different values for the severity
key to the alerts. We will see how they are used to route alerts later in the Alertmanager configuration.
Now add this rule file's name to the prometheus.yml
file so that the latter looks like this:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
- "alert_rules.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
The path to the rules file is relative to the location of the prometheus.yml. You can also give an absolute path if it's somewhere else. Now restart the container and you should be able to see the rule
in your UI at http://localhost:9000/rules
docker compose down
docker compose up --detach
This alert won't fire until it exceeds the threshold, and that will depend on how busy your machine is. So if you want to make sure it works you can change the threshold to a very low number like 0.0002 and test it. You can see the alert in http://localhost:9000/alerts
.
Now that we have a rule engine that can generate alerts, we need a way to send these alerts to an endpoint like Slack or Email or PagerDuty to receive notifications. To do this, we will use Prometheus Alertmanager.
Setting up Prometheus Alertmanager
Alertmanager is a Prometheus project that acts as a gateway between your Prometheus rule engine and external tools that can process alerts. You can configure routing rules as well as silence alerts in Alertmanager. At the end of this section you will have your Prometheus container sending alerts to your Alertmanager container.
Setting up the Alertmanager Container
First, to keep things consistent, let us create a volume for alertmanager's data:
docker volume create alertmanager
Alertmanager will use this volume to store alert silences among other things. So if you silence an alert, and your Alertmanager container restarts, the silence will still be active.
Next, create a directory called alertmanager-config
at the same level as prometheus
. This will be our Alertmanager configuration directory. Inside it, create a basic alertmanager config in a file alertmanager.yml
:
global:
# The directory from which notification templates are read.
# We won't be using this now
templates:
- '/etc/alertmanager/template/*.tmpl'
route:
group_by: ['service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: team-slack
inhibit_rules:
- source_matchers: [severity="critical"]
target_matchers: [severity="warning"]
receivers:
- name: 'team-slack'
slack_configs:
- api_url: '<webhook_url>'
channel: '#<channel_name>'
send_resolved: true
Replace the channel_name
with the Slack channel where you want to receive alerts, and the webhook_url
with a Slack webhook URL. Be careful not to commit this file with the webhook URL in it.
This configuration just routes all alerts into a Slack channel irrespective of their severity or other labels. In a later article we will see how to route alerts to different teams and channels based on labels.
Link Alertmanager to Prometheus
Now add the Alertmanager configuration to the docker-compose.yml so that it becomes part of our Prometheus setup:
volumes:
prometheus:
external: true
alertmanager:
external: true
services:
prometheus:
image: prom/prometheus:v3.0.0
volumes:
- ./config:/etc/prometheus
- prometheus:/prometheus
ports:
- 9000:9090
restart: always
links:
- alertmanager:alertmanager
alertmanager:
image: prom/alertmanager:v0.27.0
ports:
- 9001:9093
volumes:
- ./alertmanager-config/:/etc/alertmanager/
- alertmanager:/alertmanager
restart: always
We've added a link to the prometheus container so that it can find the alertmanager container.
Now there's one more piece to the puzzle. Prometheus should be configured so that it knows the Alertmanager endpoint to send to. Think of the link
in the docker-compose.yml
as an internal DNS name that is reachable by the Prometheus container. Modify the prometheus.yml
file to add this under the alertmanagers
key
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- "alert_rules.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
We are referring to Alertmanager using the internal name and its default port. If you change the link name in docker-compose.yaml
you have to change it here too.
Once you restart your containers, you should be able to access the Alertmanager UI at http://localhost:9001
.
To test this end-to-end, modify your alert_rules.yml
to lower the threshold. Within 1 minute (the value of "for
" in the alert_rules.yml
) you should see the alert message in your Slack channel.
Congratulations! You have just setup a minimal end-to-end alerting pipeline with Prometheus and Alertmanager. From here, we can do many more things:
- Setup advanced routing rules for Alertmanager, including routing to different teams on different channels, routing a subset of one team's alerts to another, and routing alerts of a certain severity to a different endpoint like PagerDuty.
- Add basic authentication and Google SSO to our Prometheus UI.
- Configure our own custom email templates for Alertmanager emails.
- Store secrets securely.
Stay tuned for updates on these topics.
Troubleshooting
If you don't see any alerts in your Slack channel, follow this step-by-step guide to troubleshoot:
- Check all the config files including the
docker-compose.yml
and verify that they match what we have been walking through. You might want to run a YAML linter on all your files. - Check the container logs for any errors. Run
docker compose logs --follow
- Are all your containers running? Run
docker ps
to check. - Is the alert firing in Prometheus? Check the Prometheus UI at
http://localhost:9000/alerts
. - If yes, is the alert reaching Alertmanager? Check the Alertmanager UI at
http://localhost:9001/#/alerts
. - If yes, is your Slack webhook URL correct? Is the channel correct? Note that the channel name should have a leading '#'.
If all of these things are correct, turn on debug for both Prometheus and Alertmanager in docker-compose.yml
:
volumes:
prometheus:
external: true
alertmanager:
external: true
services:
prometheus:
image: prom/prometheus:v3.0.0
volumes:
- ./config:/etc/prometheus
- prometheus:/prometheus
ports:
- 9000:9090
restart: always
links:
- alertmanager:alertmanager
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.path=/prometheus'
- '--log.level=debug'
alertmanager:
image: prom/alertmanager:v0.27.0
ports:
- 9001:9093
volumes:
- ./alertmanager-config/:/etc/alertmanager/
- alertmanager:/alertmanager
restart: always
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
- '--log.level=debug'
This should result in more verbose output which should help you debug. If you are still stuck, reach out to me on Twitter and I'll try my best to help.
Useful Tip
If you enable debug, don't add just the log.level
settings. You have to override the other two options also. This is because the Prometheus image ships with the storage and config directories set to /prometheus
and /etc/alertmanager/
respectively. We mount these directories in our compose file. If you override the command option with only the log level, the container will fallback on the default values for these, which are different. The same thing applies to Alertmanager.
Conclusion
Prometheus and Alertmanager together form a powerful monitoring and alerting stack. It's easy to setup for basic cases, but can require significant work for more sophisticated use-cases. This article is a guide to a basic setup of these tools.