/ #prometheus #grafana 

Monitoring Disk Space Across Servers Using Node Exporter and Prometheus

Monitoring Disk Space Across Servers Using Node Exporter and Prometheus

Ever had that sinking feeling when your server runs out of disk space at 3 AM? I have—more times than I’d like to admit. After one too many midnight emergencies, I decided to automate disk space monitoring using Prometheus, Node Exporter, and Grafana. Here’s how you can set it up too, complete with alerts and pretty dashboards to keep your sanity intact.

Why Monitor Disk Space?

Disk space is one of those silent killers. It creeps up on you, and by the time you notice, your database is refusing writes, or worse—your application crashes. With Prometheus scraping metrics from Node Exporter, you can:

  • Track disk usage in real time.
  • Set up alerts before you hit critical thresholds.
  • Visualize trends to predict future capacity needs.

What You’ll Need

  1. Servers to monitor: Linux-based (Ubuntu/Debian/CentOS).
  2. Node Exporter: Installed on each server to expose system metrics.
  3. Prometheus: To scrape and store metrics.
  4. Grafana: For visualization and dashboards.
  5. Basic familiarity with the command line and YAML configurations.

Step 1: Install Node Exporter on Each Server

Node Exporter collects system metrics (CPU, memory, disk, etc.) and exposes them for Prometheus to scrape.

Installation (Ubuntu/Debian Example):

wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar xvf node_exporter-*.tar.gz
sudo mv node_exporter-*/node_exporter /usr/local/bin/

Run as a Service (Systemd):

Create a service file (/etc/systemd/system/node_exporter.service):

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Then enable and start it:

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

Verify it’s running:

curl http://localhost:9100/metrics

You should see a wall of metrics—this means it’s working!


Step 2: Configure Prometheus to Scrape Node Exporter

Prometheus needs to know where to find your Node Exporters. Edit /etc/prometheus/prometheus.yml:

scrape_configs:
  - job_name: "node_exporter"
    static_configs:
      - targets: ["server1:9100", "server2:9100"]  # Replace with your servers' IPs/hostnames

Restart Prometheus:

sudo systemctl restart prometheus

Check the Prometheus UI (http://your-prometheus-server:9090/targets). Your Node Exporters should show as “UP.”


Step 3: Create Alerts for Disk Usage

Let’s set up an alert when disk space is running low. Add this to your Prometheus alert rules file (e.g., /etc/prometheus/rules.yml):

groups:
- name: disk_alerts
  rules:
  - alert: HighDiskUsage
    expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} * 100 / node_filesystem_size_bytes{mountpoint="/"}) > 85
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High disk usage on {{ $labels.instance }}"
      description: "Disk usage is at {{ $value }}% on {{ $labels.instance }}"

Reload Prometheus to apply the rules:

curl -X POST http://localhost:9090/-/reload

Step 4: Visualize Metrics in Grafana

Now for the fun part—dashboards!

  1. Add Prometheus as a data source in Grafana (Configuration > Data Sources).
  2. Import a dashboard: Use the Node Exporter Full dashboard (ID: 1860).
Pro Tip: Customize the dashboard to highlight disk usage panels or create your own for a cleaner view.

Troubleshooting Common Issues

  1. Node Exporter not showing up in Prometheus:

    • Check firewall rules (sudo ufw allow 9100).
    • Verify the targets in prometheus.yml are correct.
  2. Alerts not firing:

    • Ensure Prometheus is loading the rules file (check /etc/prometheus/prometheus.yml for rule_files).
    • Test the alert expression in the Prometheus UI first.
  3. Grafana shows “No Data”:

    • Confirm the data source URL is correct.
    • Check if Prometheus is scraping the targets successfully.

Going Further

  • Monitor multiple mount points: Adjust the alert rule to include /var, /home, etc.
  • Use Alertmanager: Route alerts to Slack, email, or PagerDuty.
  • Automate remediation: Trigger scripts to clean up logs or archive old files when alerts fire.

FAQ

Q: Can I monitor Windows servers with Node Exporter?

A: No, Node Exporter is for Linux/Unix. For Windows, use Windows Exporter.

Q: How often does Prometheus scrape metrics?

A: Default is every 15s, but you can adjust this in scrape_interval in prometheus.yml.

Q: Can I monitor disk I/O as well?

A: Yes! Node Exporter exposes node_disk_* metrics for I/O stats.


Need more monitoring tricks? Check out my guide on setting up Grafana alerts with custom messages. Happy monitoring!