How to Configure a Remote Data Store for Prometheus

November 13, 2024 · 10 min read

Founder @IncidentHub.cloud

Introduction

This article is part of a series on setting up an end-to-end monitoring and alerting stack using Prometheus.

The Prometheus monitoring tool can store its metrics either locally or remotely. You can configure a remote data store using the remote_write configuration. This article describes the various data store options available as well as how to set up a remote store.

Overview of Remote Storage

By default, Prometheus stores data locally wherever it is installed. The data directory can be configured by using the --storage.tsdb.path command line option when starting Prometheus. In practice you can use a separate disk for higher performance attached to the machine where Prometheus is running.

However, this may not be possible or optimal in all situations as you might want a data store that is more suited for time series data, and has larger storage capabilities for higher data retention. Prometheus would usually run in a standalone VM or a Kubernetes pod or a Docker container, and it would not have access to such data stores by default.

A remote store can add these capabilities to Prometheus. The remote storage option can be set by using the remote_write key in the Prometheus configuration YAML file.

Introduction
Overview of Remote Storage
Remote Store Architecture
Remote Store Configuration
Remote Storage Options
Troubleshooting
- Prometheus failing to write to the remote storage
Best Practices
Conclusion
FAQ

Remote Store Architecture

Remote Store Configuration

Basic Syntax

A very simple configuration for a remote store that accepts unauthenticated connections would look like this:

remote_write:
- url: "http://192.168.23.4/api/v1/write"
  name: "production-metrics"

You can have multiple remote_write sections in the same Prometheus configuration.

Based on your requirements and the features supported by the remote write server you can configure other options. Let us look at them one by one.

Security and Authentication

To protect your metrics data in transit whether it is traveling via your internal network or through the internet, you can enable both TLS as well as authentication. The remote store server should support these options.

# Remote write configuration for Prometheus
remote_write:
- url: "https://prometheus-data-store.mydb.io/api/v1/write"
  name: "production-metrics"

  headers:
    Authorization: "Bearer <token>"

  basic_auth:
    username: "prometheus"
    password: "secret-password"

  tls_config:
    insecure_skip_verify: false
    ca_file: "/path/to/ca.pem"
    cert_file: "/path/to/cert.pem"
    key_file: "/path/to/key.pem"

This sample configuration does the following:

Adds a Bearer token for authentication, as well as basic auth options. In practice you would use only one of these.
Adds a tls_config assuming you have a custom CA which has issued the certificates for the remote store's server. If it's a certificate issued by a well-known CA, you would not have to configure this. This option would come in handy when you have a private CA.

You can also create a separate authorization section for more options while setting the Authorization header. Note that the options below are mutually exclusive - the example is only for illustration.

# Example 1: Default Bearer type with direct credentials
authorization:
  type: Bearer
  credentials: "eyJhbGciOiJIPoI1NiIsInR5cCI6IkpXVCJ9..."

# Example 2: Bearer type with credentials from file. This is mutually exclusive with credentials_file
authorization:
  type: Bearer
  credentials_file: "/etc/prometheus/token.txt"

# Example 3: Custom type with direct credentials
authorization:
  type: CustomAuth
  credentials: "secret-token-123"

Remote Write Protocol Configuration

As of this writing, the remote write specification is undergoing a change. You probably don't have to worry about this section unless you are optimizing for very specific cases. You can configure the protobuf_message object that Prometheus uses when sending metrics. This depends on what your remote store server supports.

remote_write:
- url: "http://192.168.23.4/api/v1/write"
  name: "production-metrics"

  protobuf_message: prometheus.WriteRequest

Network Configuration

Based on the properties of your remote store server, you can tune some functional settings.

The remote_timeout key sets the timeout for requests to the remote write endpoint. The default value is 30s. You would not need to set this unless you have a noisy network, or there are shorter timeouts in the network path between your Prometheus server and the remote store server.

If your remote store is behind a proxy server, you can configure the proxy details in the YAML.

remote_write:
- url: "http://192.168.23.4/api/v1/write"
  remote_timeout: 45s
  name: "production-metrics"

  # Proxy configuration
  proxy_url: "http://proxy.internal:4200"
  proxy_connect_header:
    "Proxy-Authorization": ["Basic xxxxxxxxxxxxxxxxxxxx"]
    "X-Custom-Proxy-Header": ["app1", "app2"]
  proxy_from_environment: false

  follow_redirects: true
  enable_http2: true

Metrics Configuration

You can use a relabel_config key to modify or drop specific metrics before they are written to the remote store. The relabel syntax is identical to that used in the scrape_config section. You might want to do this if:

You have multiple remote stores and want specific metrics to go to specific stores to avoid unnecessary storage costs.
You have one remote store but don't want certain metrics to be written there but let them remain with Prometheus' local storage.

remote_write:
  write_relabel_configs:
    - source_labels: [__name__]
      regex: 'test_metric.*'
      action: drop
    - source_labels: [environment]
      regex: 'staging'
      action: drop

Queue Configuration

The queue_config has settings to fine tune the queue that is used to write to remote storage. Prometheus creates an internal queue for each remote write server. As it collects metrics, Prometheus maintains a write-ahead log (WAL) that it can replay if there's a crash. Each remote destination queue picks up metrics data from the WAL and sends it to the remote store server. Each queue can also have multiple shards, which can be used to configure the amount of parallelism for each queue.

You will have to to tune the queue settings only if you have a very high volume of data and/or are facing issues with the remote store struggling to keep up with your Prometheus server.

You can check out these great writeups on tuning the queue settings for remote_write.

Remote Storage Options

A non-exhaustive list of software that supports the Prometheus remote write protocol includes:

Thanos
VictoriaMetrics
Splunk
OpenTSDB
Kafka
InfluxDB
Google BigQuery

Troubleshooting

Prometheus failing to write to the remote storage

This can be caused by a number of issues:

Network connectivity between Prometheus and the remote store

Check if you can reach the remote store using ping or curl.

If there is a proxy in between, it might be dropping packets or might not be running

Check if the proxy is running. Verify that the proxy configuration as well as the Prometheus remote_write proxy settings are correct. Check the proxy server's logs for any errors. The proxy might be blocking large packets.

Requests are timing out due to network issues

Run a traceroute from your Prometheus server to the remote store to see if packets are being dropped.

Requests are timing out due to the remote store not being able to keep up

Tune the queue configuration. If this happens suddenly, it's important to find out the root cause.

The number of metrics might have increased due to autoscaling events or an increase in cardinality.
The remote store might have disk issues.

Best Practices

Backup your data in the remote store.
Add security and authentication between your Prometheus and the remote store server. If your remote store does not support this natively, you can add a proxy like nginx in between and configure it to have TLS and authentication.
Monitor your remote store metrics for indications of trouble.
If you are in a regulated industry, ensure that your remote store is compliant with your requirements. E.g. if it's managed by a cloud vendor, ascertain that their security credentials are sufficient for your needs.

Conclusion

The remote store functionality in Prometheus offers a scalable and flexible way of adding a dedicated storage backend for Prometheus metrics. You can use the remote store for increased data retention, durability of data, and offline data analysis.

FAQ

What is remote storage in Prometheus?

An external system to store Prometheus metrics in addition to using local storage.

Why would I need Prometheus remote storage?

For longer data retention, larger storage capacity, increased durability, and the ability to use dedicated time-series storage systems.

How do I configure Prometheus remote storage?

Add a remote_write configuration to your Prometheus configuration YAML. For more details, see the sections above.

What Prometheus remote storage options are available?

There are many options including but not limited to Thanos, VictoriaMetrics, Splunk, OpenTSDB, Kafka, InfluxDB, and Google BigQuery.

How can I secure my Prometheus remote storage connection?

Use TLS encryption, basic auth, or bearer tokens.

Can I send different metrics to different Prometheus remote stores?

Yes, you can use write_relabel_configs in individual remote_write sections to filter metrics.

What should I do if Prometheus remote writes are failing?

Some things to check are network connectivity, proxy settings, remote store performance settings, and queue configuration.

What are the best practices for Prometheus remote storage?

Backup your data, implement security between Prometheus and the remote store server, monitor the remote store's key metrics, and ensure compliance with your regulatory requirements.

Q: Can I have multiple Prometheus remote storage destinations?

Yes, you can configure multiple remote_write sections.

Introduction​

Overview of Remote Storage​

Remote Store Architecture​

Remote Store Configuration​

Basic Syntax​

Security and Authentication​

Remote Write Protocol Configuration​

Network Configuration​

Metrics Configuration​

Queue Configuration​

Remote Storage Options​

Troubleshooting​

Prometheus failing to write to the remote storage​

Network connectivity between Prometheus and the remote store​

If there is a proxy in between, it might be dropping packets or might not be running​

Requests are timing out due to network issues​

Requests are timing out due to the remote store not being able to keep up​

Best Practices​

Conclusion​

FAQ​