How to Monitor API Health with Blackbox Exporter and Prometheus
How to Monitor API Health with Blackbox Exporter and Prometheus
Ever had an API go down silently, only to realize it after users started complaining? I’ve been there—more times than I’d like to admit. That’s why I now rely on Prometheus and Blackbox Exporter to proactively monitor API health. In this guide, I’ll walk you through setting up Blackbox Exporter to probe endpoints, track latency, and alert you the moment something goes sideways.
Why Blackbox Exporter?
Blackbox Exporter is like having a dedicated API watchdog. It sends HTTP, TCP, or ICMP probes to your endpoints and reports back metrics like:
- Uptime/downtime
- Response latency
- SSL certificate expiry
Pair it with Prometheus for scraping and Grafana for visualization, and you’ve got a robust monitoring system.
What You’ll Need
- Prometheus (already installed and running)
- Blackbox Exporter (download here)
- A target API endpoint to monitor (e.g.,
https://api.example.com/health
) - Basic familiarity with YAML configs (don’t worry, I’ll guide you).
Step 1: Install and Configure Blackbox Exporter
Installation
Run the following to download and extract Blackbox Exporter:
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.24.0/blackbox_exporter-0.24.0.linux-amd64.tar.gz
tar -xvf blackbox_exporter-0.24.0.linux-amd64.tar.gz
cd blackbox_exporter-0.24.0.linux-amd64
Configuration
Edit blackbox.yml
to define your probes. Here’s a minimal HTTP check:
modules:
http_2xx:
prober: http
http:
preferred_ip_protocol: "ipv4"
valid_status_codes: [200]
no_follow_redirects: false
Start the exporter:
./blackbox_exporter --config.file=blackbox.yml
Verify it’s running by visiting http://localhost:9115
(the default port).
Step 2: Configure Prometheus to Scrape Blackbox
Add a job to prometheus.yml
to scrape Blackbox’s metrics and probe your API:
scrape_configs:
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx] # Use the module defined earlier
static_configs:
- targets:
- https://api.example.com/health # Your API endpoint
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115 # Blackbox Exporter address
Restart Prometheus:
systemctl restart prometheus
Step 3: Set Up Alerts for Downtime or High Latency
Now, let’s alert on two critical scenarios:
- API is down (non-200 status).
- Latency exceeds 500ms.
Add these rules to your Prometheus alert manager (alert.rules.yml
):
groups:
- name: api_health
rules:
- alert: APIUnavailable
expr: probe_success{job="blackbox"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "API is down (instance: {{ $labels.instance }})"
- alert: HighLatency
expr: probe_duration_seconds{job="blackbox"} > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "High API latency ({{ $value }}s on {{ $labels.instance }})"
Step 4: Visualize Metrics in Grafana
Create a dashboard to track:
- Uptime (
probe_success
) - Latency (
probe_duration_seconds
) - SSL expiry (
probe_ssl_earliest_cert_expiry
)
Here’s a sample Grafana query for latency:
avg(probe_duration_seconds{instance=~"$instance"}) by (instance)
$instance
) to make your dashboard dynamic.Troubleshooting
Problem: Blackbox Exporter crashes on startup.
Fix: Check the YAML syntax—indentation matters! Use yamllint
to validate.
Problem: Prometheus isn’t scraping metrics.
Fix: Verify the targets
and relabel_configs
in prometheus.yml
.
Problem: Alerts aren’t firing.
Fix: Ensure Alertmanager is correctly configured to route alerts (e.g., to Slack or email).
FAQ
Q: Can Blackbox monitor non-HTTP endpoints?
A: Yes! It supports TCP, ICMP, and DNS probes—just tweak the modules
in blackbox.yml
.
Q: How do I monitor multiple APIs?
A: Add more targets under static_configs
in Prometheus, or use service discovery.
Q: What’s the overhead of running Blackbox?
A: Minimal. A single instance can handle hundreds of probes.
Next Steps
- Monitor internal services (e.g., databases, MQTT brokers).
- Set up synthetic checks for user journeys.
- Integrate with Grafana alerts for richer notifications.
For more on observability, check out my posts on Grafana alerting and Zigbee2MQTT monitoring.
Now go forth and never be blindsided by API downtime again! 🚀