/ #prometheus #monitoring 

Using Labels and Relabeling in Prometheus for Clean, Scalable Metrics

I’ll admit it: the first time I looked at a raw Prometheus metrics endpoint, I felt a bit overwhelmed. It was like staring at a tangled ball of yarn, with metric names and labels sprawling in every direction. I was collecting thousands of time series, but finding the specific signal in the noise felt nearly impossible.

That’s when I truly understood the power of labels and relabeling in Prometheus. They aren’t just decorative tags; they’re the fundamental tools for organizing your metrics universe. Getting them right transforms your monitoring from a chaotic mess into a clean, queryable, and scalable system.

In this guide, I’ll walk you through how I learned to wield labels and relabeling rules effectively, turning my Prometheus instance from a data hoarder into a well-organized observability powerhouse.

What You’ll Need

Before we dive into the configuration, let’s make sure you have the basics covered. You don’t need much to follow along, just:

  • A Running Prometheus Server: This could be on a local machine, a VM, or in a container. I’m running mine in a Docker container for easy experimentation.
  • A Few Sample Applications or Exporters: To have something to scrape. The Node Exporter is a classic choice for system metrics, but any application exposing a /metrics endpoint will work.
  • A Text Editor: For editing your prometheus.yml file. I’m a fan of VS Code, but use whatever you’re comfortable with.
  • Basic YAML Knowledge: Prometheus configuration is YAML-based, so knowing how indentation and lists work is key.

Understanding the Core Concepts: Labels vs. Relabeling

Let’s clear up the terminology first, as I found this confusing in the beginning.

  • Labels: These are key-value pairs attached to a time series. Think of them like tags or attributes. For example, a metric http_requests_total might have labels like method="POST", handler="/api/users", and status_code="200". They are the output—the final form of your metric that you query in PromQL.
  • Relabeling: This is the process of manipulating these labels before they become part of the final time series. It happens during service discovery, scraping, and even when sending alerts. Relabeling is your chance to add, modify, drop, or filter metrics based on their labels.

The magic happens in the relabel_configs section of your scrape jobs. I like to visualize it as a filter and enrichment pipeline for your metrics data.

Step 1: Laying the Groundwork with Static Labels

The simplest way to add labels is statically in your prometheus.yml. This is perfect for adding context that applies to an entire job.

Let’s look at a basic scrape configuration. Open your prometheus.yml file.

# prometheus.yml

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']
    # These labels are added to every time series scraped by this job
    labels:
      region: 'home-lab'
      environment: 'development'
      job: 'node'  # Overrides the job_name label

In this example, every single metric coming from the Node Exporter will now have the labels region="home-lab", environment="development", and job="node". This is incredibly useful for grouping and filtering metrics in Grafana. For instance, you can easily write a query like up{environment="development"} to check the status of all your dev instances.

Pro Tip: While you can use the `job` label, be mindful that many service discovery mechanisms and the `honor_labels` setting can affect it. Being explicit is often safer.

Step 2: Dynamic Labeling with Relabeling Rules

Static labels are great, but the real power comes from dynamic relabeling in Prometheus. This is where you can create labels based on information from the scrape target itself.

Example 1: Extracting Environment from a Hostname

Imagine your instances have hostnames like web-server-prod-01 and db-server-staging-02. You can tear this apart and create meaningful labels.

scrape_configs:
  - job_name: 'node_exporter_by_hostname'
    static_configs:
      - targets:
        - 'web-server-prod-01:9100'
        - 'db-server-staging-02:9100'
    relabel_configs:
      # Use a regex to extract parts of the instance (hostname:port)
      - source_labels: [__address__]
        regex: '([^:-]+)-(prod|staging|dev)-([^:-]+)'
        target_label: 'service'
        replacement: '$1' # The first group, e.g., 'web-server' or 'db-server'
      - source_labels: [__address__]
        regex: '([^:-]+)-(prod|staging|dev)-([^:-]+)'
        target_label: 'environment'
        replacement: '$2' # The second group, e.g., 'prod' or 'staging'
      - source_labels: [__address__]
        regex: '([^:-]+)-(prod|staging|dev)-([^:-]+)'
        target_label: 'instance_id'
        replacement: '$3' # The third group, e.g., '01' or '02'

Now, instead of a messy instance label with the full hostname and port, you have clean, separate labels for service, environment, and instance_id. Your Grafana dashboards just got a whole lot more powerful.

Example 2: Filtering Metrics with metric_relabel_configs

Sometimes, you don’t want all the metrics. Some might be too noisy, expensive to store, or just irrelevant. This is where metric_relabel_configs comes in. It applies relabeling rules after the scrape, on the metrics themselves.

Let’s say the Node Exporter’s node_network_receive_bytes_total metric has multiple time series per network interface, and you only care about eth0 and wlan0.

scrape_configs:
  - job_name: 'node_exporter_filtered'
    static_configs:
      - targets: ['localhost:9100']
    metric_relabel_configs:
      # Drop all network receive metrics that are NOT for eth0 or wlan0
      - source_labels: [__name__, device]
        regex: 'node_network_receive_bytes_total;(eth0|wlan0)'
        action: keep
      # Alternatively, to DROP specific interfaces (like the docker bridge)
      # - source_labels: [device]
      #   regex: 'docker0'
      #   action: drop

The action: keep tells Prometheus to only keep the time series where the combination of the metric name (__name__) and the device label matches either eth0 or wlan0. Everything else is discarded, saving storage and reducing clutter.

Step 3: Advanced Relabeling with Service Discovery

This is where relabeling becomes absolutely essential. When using service discovery (like file-based SD, Consul, or Kubernetes), your targets come with a set of metadata labels. Relabeling is how you map that metadata into useful metric labels.

File-Based Service Discovery Example

Create a file called targets.json:

[
  {
    "targets": [ "192.168.1.10:9100" ],
    "labels": {
      "environment": "production",
      "role": "webserver",
      "dc": "home-dc-1"
    }
  },
  {
    "targets": [ "192.168.1.11:9100" ],
    "labels": {
      "environment": "staging",
      "role": "database",
      "dc": "home-dc-1"
    }
  }
]

Now, configure Prometheus to use this file and employ relabeling to use these labels.

scrape_configs:
  - job_name: 'file_sd_nodes'
    file_sd_configs:
      - files:
        - 'targets.json'
        refresh_interval: 5m # Reload the file every 5 minutes
    relabel_configs:
      # The labels from the JSON file are automatically available as source_labels.
      # We can directly map them to the final metric labels.
      - source_labels: [__meta_sd_environment]
        target_label: 'environment'
      - source_labels: [__meta_sd_role]
        target_label: 'role'
      - source_labels: [__meta_sd_dc]
        target_label: 'datacenter'
      # A common pattern: use the instance IP from the target for the 'instance' label
      - source_labels: [__address__]
        target_label: instance
        regex: '(.+):.+'
        replacement: '${1}'

Troubleshooting and Common Pitfalls

My journey with relabeling wasn’t without its bumps. Here are the main issues I ran into and how to solve them.

  1. “My labels aren’t showing up!" This is the most common problem. Use the Prometheus UI’s Status > Targets page. It shows you the discovered labels before and after relabeling. This is your best friend for debugging. Look for the Before relabeling and After relabeling sections for your target.

  2. Regex Catastrophes. Prometheus uses RE2 syntax. It doesn’t support all the fancy features of PCRE. Test your regexes in a tool like Regex101 with the “Golang” flavor selected.

  3. Dropping the Wrong Metrics. Be very careful with action: drop. It’s easy to be too aggressive and drop metrics you need. Always test new drop rules in a staging environment first and monitor your scraped metric counts.

  4. Cardinality Explosion. This is the big one. Never, ever create labels from unbounded sets of values. A classic mistake is putting a user ID, session ID, or entire log line into a label. Each unique combination of label values creates a new time series. A few hundred is fine; a few million will kill your Prometheus server. Stick to low-cardinality values like status codes, environment names, endpoint names, and server roles.

Taking It Further

Once you’re comfortable with the basics, you can explore more advanced patterns:

  • Cross-Service Label Unification: Standardize label names (e.g., always use env instead of a mix of env, environment, deployment) across all your jobs and exporters. This makes Grafana dashboards and alerting rules much simpler.
  • Alert Relabeling: Use alert_relabel_configs in your alertmanager.yml to add context to your alerts before they are sent to channels like Slack or PagerDuty, similar to how I detailed in my guide on Grafana Alerting with Custom Messages.
  • Remote Write Relabeling: If you’re sending data to a long-term storage like Grafana Mimir or Thanos, you can use relabeling there to further filter or aggregate data, optimizing costs.

Wrapping Up

Mastering labels and relabeling in Prometheus was a game-changer for my monitoring setup. It transformed a sprawling collection of data points into a well-organized, queryable, and actionable system. Start simple with static labels, then gradually introduce dynamic relabeling to filter and enrich your metrics. Always keep an eye on cardinality, and use the Prometheus UI to debug.

Happy monitoring!


FAQ

Q: What is the difference between relabel_configs and metric_relabel_configs? A: relabel_configs happens before the scrape, acting on the target metadata. It’s used for deciding what to scrape and adding labels from service discovery. metric_relabel_configs happens after the scrape, acting on the metrics that were returned. It’s used for filtering out unwanted metrics or modifying labels on the metrics themselves.

Q: How can I prevent high cardinality in my Prometheus labels? A: Strictly avoid using any user-specific IDs, email addresses, or other unbounded, unique values as label values. Limit labels to known, finite sets of values like environment (prod, staging), HTTP method (GET, POST), status code (200, 404, 500), or predefined resource names.

Q: Can I modify the metric name itself with relabeling? A: Yes, you can! The metric name is stored in the special __name__ label. You can create a relabeling rule with target_label: __name__ to rename a metric. For example, you could standardize metric names coming from different exporters.

Q: Where is the best place to see the effect of my relabeling rules? A: The Prometheus web UI under Status > Targets is the definitive source. It shows you the complete list of labels for each target both before and after your relabel_configs have been applied. For metric_relabel_configs, you can query the metrics directly or check the /metrics endpoint of your target.