A Beginner's Guide To Service Discovery in Prometheus
Introduction
This article is part of a series on setting up an end-to-end monitoring and alerting stack using Prometheus.
Service discovery (SD) is a mechanism by which the Prometheus monitoring tool can discover monitorable targets automatically. Instead of listing down each and every target to be scraped in the Prometheus configuration, service discovery acts as a source of targets that Prometheus can query at runtime.
Service discovery becomes crucial when there are dynamically changing hosts, especially in microservices architectures and environments like Kubernetes. In Prometheus parlance, service discovery is a way of discovering "scrape targets".
For example, pods are created dynamically in Kubernetes as a result of new services being deployed and undeployed, autoscaling events, and errors causing pods to crash and go away. If you are using Prometheus for scraping pods in such an environment, Prometheus has to know which pods are running and scrapable at any given point in time. The Kubernetes service discovery pluging enables this. Similarly, there are SD plugins for other common environments.
You can use service discovery in Prometheus with the predefined plugins, or write your own custom ones using file or HTTP, depending on the situation.
- Introduction
- Types of Prometheus Service Discovery
- Configuring Service Discovery in Prometheus
- Combining Multiple Service Discovery Mechanisms
- Troubleshooting Service Discovery
- Conclusion
- FAQs
Types of Prometheus Service Discovery
Predefined Mechanisms in Prometheus
Prometheus has out of the box support for discovering scrape targets for many popular environments, including:
- Amazon Web Services (EC2 instances)
- Azure (Azure VMs)
- Consul
- Digital Ocean
- DNS
- Google Cloud Platform (Google Compute Engine VMs)
- Hetzner
- Kubernetes
- Linode
- OpenStack
This list is not exhaustive. For the full list, see the the Prometheus GitHub repository.
Custom Service Discovery in Prometheus, or Writing Your Own
You may have infrastructure or application endpoints that cannot be discovered by the standard mechanisms. In such cases you can use write your own. There are two options available.
HTTP based service discovery
You can write an HTTP-based mechanism and return the scrape target information in response to Prometheus' GET requests. Prometheus will perform a GET request periodically - by default every 1 minute. This periodic request is made so that Prometheus has the latest list of targets. You can see this as a configurable parameter in the standard SD configurations of AWS and others, and you can also include it in your SD configuration as "refresh_interval". Note that this interval is different from the scrape_interval, which is used by Prometheus to scrape the targets themselves.
There are a few basic requirements for HTTP service discovery:
- Response should be in JSON with the correct HTTP
Content-Type
header. - The content must be in
UTF-8
. - Authentication if required can be Basic, using the Authorization header, or OAuth 2.0. You would typically not need authentication if the endpoint is in your internal network, or part of your applications.
- If there are no scrape targets, the endpoint should return an empty list.
A sample configuration for an HTTP service discovery mechanism can look like this:
http_sd_config:
url: 'http://192.168.2.34/api/internal/hosts'
refresh_interval: 600
http_headers:
"Purpose":
values: ["Prometheus-scraper"]
Internally, your HTTP endpoint would query a database or inventory to fetch the list of targets and return them.
File based service discovery
File-based service discovery is another alternative if you need to provide a custom list of scrape targets. To do this, you can create a file and list down your scrape targets in it. It is important to note that this is also a dynamic mechanism like HTTP service discovery. Prometheus will check for changes to the file at periodic intervals. This interval is configured with the "refresh_interval" key, just like in others. The default is 5 minutes.
Requirements for file based service discovey:
- Files can be in JSON or YAML.
- You can specify a pattern to match multiple files. This is helpful if you wish to keep your scrape targets grouped logically across separate files.
- Malformed JSON or YAML files are ignored, so ensure that they conform to the required format.
In the Prometheus configuration, you can specify it as follows:
file_sd_config:
files:
- "/etc/prometheus/external/targets/*.yml"
- "/opt/monitoring/targets/prod-*.yml"
- "/data/dynamic-targets-[0-9]*.yaml"
refresh_interval: 120