obsidian/wiki/concepts/old-gpu-sysfs-metrics.md
2026-04-24 11:19:08 +01:00

5.5 KiB
Raw Blame History

title aliases tags sources created updated
Old GPU Sysfs Metrics — AMD GCN 1.0 and Intel iGPU Limitations
gpu-sysfs-metrics
amd-oland-metrics
gcn1-metrics
intel-igpu-metrics
gpu-busy-percent
gpu
amd
intel
sysfs
prometheus
grafana
homelab
metrics
monitoring
daily/2026-04-21.md
2026-04-21 2026-04-21

Old GPU Sysfs Metrics — AMD GCN 1.0 and Intel iGPU Limitations

AMD GCN 1.0 generation GPUs (codenamed Oland, Cape Verde, Pitcairn, etc., circa 20122013) and Intel HD 600-series integrated GPUs do not expose gpu_busy_percent through the Linux sysfs interface. These chips predate the kernel driver support for hardware utilization counters. Temperature and fan metrics are still available via hwmon, but GPU utilization percentage cannot be collected without specialized tools.

Key Points

  • AMD Oland (GCN 1.0) does not expose gpu_busy_percent via /sys/class/drm/card*/device/gpu_busy_percent — the sysfs file does not exist
  • Intel HD 630 and similar iGPUs also lack sysfs utilization exposure; Intel's metrics are only accessible via intel_gpu_top (requires root) or vendor-specific interfaces
  • Temperatures are still available via hwmon: CPU cores, NVMe, board temperature are exposed regardless of GPU generation
  • Textfile collector is the correct long-term approach — write a shell script to collect GPU metrics via available tools (e.g., amdgpu_top, radeontop) and expose them as Prometheus metrics via node_exporter's --collector.textfile flag
  • At the time of the 2026-04-21 session, the textfile collector infrastructure was set up but the cron job was not activated (pending confirmation)

Details

Why gpu_busy_percent Is Missing on GCN 1.0

The gpu_busy_percent sysfs attribute is provided by the amdgpu kernel driver starting with GCN 2.0 hardware (Bonaire/Hawaii, 2013+). GCN 1.0 cards (Oland, Pitcairn, Tahiti) use the same driver but the hardware performance counter interface is not implemented for that silicon generation. The file simply doesn't exist in sysfs.

Checking:

# This path does not exist on GCN 1.0
cat /sys/class/drm/card0/device/gpu_busy_percent
# cat: /sys/class/drm/card0/device/gpu_busy_percent: No such file or directory

# This DOES work — temperature via hwmon
cat /sys/class/hwmon/hwmon*/temp*_input

Intel iGPU (HD 600 Series)

Intel HD 630 (Kaby Lake) and similar integrated GPUs expose minimal sysfs data. Intel GPU utilization requires either:

  • intel_gpu_top — requires root, outputs to terminal, not easily scriptable for Prometheus
  • /sys/class/drm/card*/gt/gt0/rc6_residency_ms — residency counter, not utilization percentage
  • Intel GVT-g GPU virtualization layer — complex setup, not appropriate for simple monitoring

For a homelab, accepting that Intel iGPU utilization won't be in dashboards is the pragmatic choice.

What IS Available (All GPU Generations)

Even on old hardware, these metrics are typically available:

# GPU temperature (amdgpu hwmon)
/sys/class/hwmon/hwmon*/temp1_input  # GPU die temp (millidegrees C)

# Fan speed (if applicable)
/sys/class/hwmon/hwmon*/fan1_input   # RPM

# Clock frequencies (may be available)
/sys/class/hwmon/hwmon*/freq1_input  # GPU clock Hz

In Grafana: CPU core temps, NVMe temp, board temp, and fan RPM can all be scraped via node_exporter and displayed in a System Overview dashboard even without GPU utilization %.

Textfile Collector Pattern for Future GPUs

When the homelab gets a newer GPU (GCN 2.0+ or NVIDIA), use the textfile collector to expose metrics:

# /usr/local/bin/gpu-metrics.sh
#!/bin/bash
# Scrapes AMD GPU utilization and writes Prometheus-format metrics

GPU_BUSY=$(cat /sys/class/drm/card0/device/gpu_busy_percent 2>/dev/null || echo 0)
GPU_TEMP=$(cat /sys/class/hwmon/hwmon2/temp1_input 2>/dev/null | awk '{printf "%.1f", $1/1000}')

cat > /var/lib/node_exporter/textfile_collector/gpu.prom << EOF
# HELP gpu_busy_percent GPU utilization percentage
# TYPE gpu_busy_percent gauge
gpu_busy_percent{gpu="card0"} ${GPU_BUSY}
# HELP gpu_temperature_celsius GPU temperature
# TYPE gpu_temperature_celsius gauge
gpu_temperature_celsius{gpu="card0"} ${GPU_TEMP}
EOF

Activate with a cron job:

echo '* * * * * root /usr/local/bin/gpu-metrics.sh' >> /etc/cron.d/gpu-metrics

Start node_exporter with --collector.textfile.directory=/var/lib/node_exporter/textfile_collector/.

Grafana System Overview Dashboard (GCN 1.0 Compatible)

A useful System Overview dashboard for older hardware with limited GPU metrics:

  • CPU utilization % per core
  • RAM usage (total/used/available)
  • Disk usage by mount point
  • CPU core temperatures (°C)
  • NVMe temperature (°C)
  • Board temperature (°C)
  • Fan speeds (RPM, PWM %)
  • Network I/O

GPU utilization panel: show a text annotation "GPU utilization not supported on AMD GCN 1.0" rather than leaving the panel empty.

Sources

  • daily/2026-04-21.md — Grafana System Overview dashboard creation; AMD Oland (GCN 1.0) confirmed not to support gpu_busy_percent via sysfs; Intel HD 630 similarly limited; textfile collector infrastructure set up for future GPU upgrades; cron activation left pending