5.5 KiB
| title | aliases | tags | sources | created | updated | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Old GPU Sysfs Metrics — AMD GCN 1.0 and Intel iGPU Limitations |
|
|
|
2026-04-21 | 2026-04-21 |
Old GPU Sysfs Metrics — AMD GCN 1.0 and Intel iGPU Limitations
AMD GCN 1.0 generation GPUs (codenamed Oland, Cape Verde, Pitcairn, etc., circa 2012–2013) and Intel HD 600-series integrated GPUs do not expose gpu_busy_percent through the Linux sysfs interface. These chips predate the kernel driver support for hardware utilization counters. Temperature and fan metrics are still available via hwmon, but GPU utilization percentage cannot be collected without specialized tools.
Key Points
- AMD Oland (GCN 1.0) does not expose
gpu_busy_percentvia/sys/class/drm/card*/device/gpu_busy_percent— the sysfs file does not exist - Intel HD 630 and similar iGPUs also lack sysfs utilization exposure; Intel's metrics are only accessible via
intel_gpu_top(requires root) or vendor-specific interfaces - Temperatures are still available via hwmon: CPU cores, NVMe, board temperature are exposed regardless of GPU generation
- Textfile collector is the correct long-term approach — write a shell script to collect GPU metrics via available tools (e.g.,
amdgpu_top,radeontop) and expose them as Prometheus metrics vianode_exporter's--collector.textfileflag - At the time of the 2026-04-21 session, the textfile collector infrastructure was set up but the cron job was not activated (pending confirmation)
Details
Why gpu_busy_percent Is Missing on GCN 1.0
The gpu_busy_percent sysfs attribute is provided by the amdgpu kernel driver starting with GCN 2.0 hardware (Bonaire/Hawaii, 2013+). GCN 1.0 cards (Oland, Pitcairn, Tahiti) use the same driver but the hardware performance counter interface is not implemented for that silicon generation. The file simply doesn't exist in sysfs.
Checking:
# This path does not exist on GCN 1.0
cat /sys/class/drm/card0/device/gpu_busy_percent
# cat: /sys/class/drm/card0/device/gpu_busy_percent: No such file or directory
# This DOES work — temperature via hwmon
cat /sys/class/hwmon/hwmon*/temp*_input
Intel iGPU (HD 600 Series)
Intel HD 630 (Kaby Lake) and similar integrated GPUs expose minimal sysfs data. Intel GPU utilization requires either:
intel_gpu_top— requires root, outputs to terminal, not easily scriptable for Prometheus/sys/class/drm/card*/gt/gt0/rc6_residency_ms— residency counter, not utilization percentage- Intel GVT-g GPU virtualization layer — complex setup, not appropriate for simple monitoring
For a homelab, accepting that Intel iGPU utilization won't be in dashboards is the pragmatic choice.
What IS Available (All GPU Generations)
Even on old hardware, these metrics are typically available:
# GPU temperature (amdgpu hwmon)
/sys/class/hwmon/hwmon*/temp1_input # GPU die temp (millidegrees C)
# Fan speed (if applicable)
/sys/class/hwmon/hwmon*/fan1_input # RPM
# Clock frequencies (may be available)
/sys/class/hwmon/hwmon*/freq1_input # GPU clock Hz
In Grafana: CPU core temps, NVMe temp, board temp, and fan RPM can all be scraped via node_exporter and displayed in a System Overview dashboard even without GPU utilization %.
Textfile Collector Pattern for Future GPUs
When the homelab gets a newer GPU (GCN 2.0+ or NVIDIA), use the textfile collector to expose metrics:
# /usr/local/bin/gpu-metrics.sh
#!/bin/bash
# Scrapes AMD GPU utilization and writes Prometheus-format metrics
GPU_BUSY=$(cat /sys/class/drm/card0/device/gpu_busy_percent 2>/dev/null || echo 0)
GPU_TEMP=$(cat /sys/class/hwmon/hwmon2/temp1_input 2>/dev/null | awk '{printf "%.1f", $1/1000}')
cat > /var/lib/node_exporter/textfile_collector/gpu.prom << EOF
# HELP gpu_busy_percent GPU utilization percentage
# TYPE gpu_busy_percent gauge
gpu_busy_percent{gpu="card0"} ${GPU_BUSY}
# HELP gpu_temperature_celsius GPU temperature
# TYPE gpu_temperature_celsius gauge
gpu_temperature_celsius{gpu="card0"} ${GPU_TEMP}
EOF
Activate with a cron job:
echo '* * * * * root /usr/local/bin/gpu-metrics.sh' >> /etc/cron.d/gpu-metrics
Start node_exporter with --collector.textfile.directory=/var/lib/node_exporter/textfile_collector/.
Grafana System Overview Dashboard (GCN 1.0 Compatible)
A useful System Overview dashboard for older hardware with limited GPU metrics:
- CPU utilization % per core
- RAM usage (total/used/available)
- Disk usage by mount point
- CPU core temperatures (°C)
- NVMe temperature (°C)
- Board temperature (°C)
- Fan speeds (RPM, PWM %)
- Network I/O
GPU utilization panel: show a text annotation "GPU utilization not supported on AMD GCN 1.0" rather than leaving the panel empty.
Related Concepts
- wiki/concepts/prometheus-joules-watts-gotcha — related Prometheus/Grafana metric gotcha for power dashboards
- wiki/concepts/beszel-monitoring-deployment — Beszel as a simpler monitoring layer that doesn't require GPU-specific metric setup
- wiki/homelab/_index — homelab hardware and Grafana/Prometheus infrastructure
Sources
- daily/2026-04-21.md — Grafana System Overview dashboard creation; AMD Oland (GCN 1.0) confirmed not to support
gpu_busy_percentvia sysfs; Intel HD 630 similarly limited; textfile collector infrastructure set up for future GPU upgrades; cron activation left pending