12 April 2025 / #prometheus #victoriametrics

High-Cardinality Metrics: Detection and Optimization in Prometheus and VictoriaMetrics

High-Cardinality Metrics: Detection and Optimization in Prometheus and VM

I remember the first time my Prometheus instance crashed spectacularly after I added a new exporter. The logs screamed about “out of memory” errors, and my Grafana dashboards turned into ghost towns. After some frantic debugging, I discovered the culprit: high-cardinality metrics.

In this guide, I’ll share practical techniques I’ve learned for identifying and optimizing these metric monsters in both Prometheus and VictoriaMetrics.

💡 Pro Tip: High-cardinality metrics are like uninvited guests at a party - they consume all your resources and leave you with a mess to clean up.

Why High-Cardinality Metrics Matter

High-cardinality metrics occur when a metric has too many unique label combinations. Common offenders include:

HTTP request metrics with full URLs as labels
User-specific metrics with user IDs
Container metrics with randomly generated pod names

These can cause:

Memory explosions in Prometheus
Slow queries in both Prometheus and VictoriaMetrics
Storage bloat from excessive time series

Tools You’ll Need

A running Prometheus instance (installation guide)
VictoriaMetrics (installation docs)
PromQL and MetricsQL knowledge
promtool (comes with Prometheus)
A terminal with curl and jq installed

Step 1: Identifying High-Cardinality Metrics

Using Prometheus UI

Run this query to find metrics with the most series:

topk(10, count by (__name__)({__name__=~".+"}))

For a specific metric, check its cardinality:

count(count by (le, method, path, status)(http_request_duration_seconds_bucket))

Using VictoriaMetrics

VictoriaMetrics provides special tools for this:

# List highest cardinality metrics
curl http://vmselect:8481/select/0/prometheus/api/v1/series/count | jq

Using Promtool

promtool tsdb analyze /path/to/prometheus/data | grep -A10 "High cardinality"

🛠️ Note: I found that VictoriaMetrics tends to handle high-cardinality metrics better than Prometheus, but they'll still impact performance.

Step 2: Optimizing High-Cardinality Metrics

Strategy 1: Reduce Label Cardinality

Instead of:

- name: http_requests_total
  labels:
    user_id: "12345"
    path: "/api/v1/users/12345/profile"

Use:

- name: http_requests_total
  labels:
    user_type: "registered" # Instead of user_id
    path_pattern: "/api/v1/users/:id/profile" # Parameterized path

Strategy 2: Use Recording Rules

Create aggregation rules in prometheus.rules.yml:

groups:
  - name: http_aggregated
    rules:
      - record: http_request_duration_seconds:rate5m
        expr: |
          sum by (service, status_code, method) (
            rate(http_request_duration_seconds_bucket[5m])
          )

Strategy 3: VictoriaMetrics Specific Optimizations

VictoriaMetrics offers several helpful features:

Deduplication:

# In vmstorage args
-enableDeduplication

Limiting series creation:

-storage.maxHourlySeries=1000000 -storage.maxDailySeries=10000000

Step 4: Monitoring Cardinality

Create a dashboard panel with these queries:

Prometheus:

sum(scrape_series_added) by (job)

VictoriaMetrics:

vm_metrics_with_highest_cardinality

Troubleshooting Common Issues

Problem: Prometheus crashes with “out of memory” errors
Solution:

Reduce scrape interval for high-cardinality jobs
Set --storage.tsdb.retention.time=7d to limit storage impact

Problem: Queries timeout
Solution:

Use more specific time ranges
Create pre-aggregated recording rules

Going Further

For additional optimizations:

Consider Prometheus relabeling to drop unnecessary labels
Explore VictoriaMetrics' downsampling
Implement cardinality limits in Prometheus 2.30+

FAQ

Q: How many labels is too many?
A: There’s no hard rule, but metrics with >10,000 unique series often cause problems.

Q: Should I use VictoriaMetrics instead of Prometheus?
A: VictoriaMetrics handles high-cardinality better, but both benefit from optimization.

Q: Can I fix this without modifying my exporters?
A: Yes! Use metric relabeling in your Prometheus config to drop or hash high-card labels.

For more on monitoring fundamentals, check out my guide on Grafana alerting with custom messages.

High-Cardinality Metrics: Detection and Optimization in Prometheus and VictoriaMetrics

High-Cardinality Metrics: Detection and Optimization in Prometheus and VM

Why High-Cardinality Metrics Matter

Tools You’ll Need

Step 1: Identifying High-Cardinality Metrics

Using Prometheus UI

Using VictoriaMetrics

Using Promtool

Step 2: Optimizing High-Cardinality Metrics

Strategy 1: Reduce Label Cardinality

Strategy 2: Use Recording Rules

Strategy 3: VictoriaMetrics Specific Optimizations

Step 4: Monitoring Cardinality

Troubleshooting Common Issues

Going Further

FAQ

Control an IR Remote AC Using ESP32 and Home Assistant

Automate Your Garden Watering with ESP32 and Home Assistant

High-Cardinality Metrics: Detection and Optimization in Prometheus and VM

Why High-Cardinality Metrics Matter

Tools You’ll Need

Step 1: Identifying High-Cardinality Metrics

Using Prometheus UI

Using VictoriaMetrics

Stop wasting your time !

Using Promtool

Step 2: Optimizing High-Cardinality Metrics

Strategy 1: Reduce Label Cardinality

Strategy 2: Use Recording Rules

Strategy 3: VictoriaMetrics Specific Optimizations

Step 4: Monitoring Cardinality

Troubleshooting Common Issues

Going Further

FAQ