obsidian/wiki/concepts/gpu-device-node-lxc-docker.md
2026-05-09 17:44:30 +01:00

99 lines
3.8 KiB
Markdown

---
title: "GPU Device Node Mismatch in LXC Docker — Exit Code 255"
aliases: [gpu-lxc-docker, renderD-mismatch, gpu-bind-mount-lxc, jellyfin-gpu-lxc]
tags: [gpu, lxc, proxmox, docker, jellyfin, passthrough, debugging]
sources:
- "daily/2026-05-03.md"
created: 2026-05-03
updated: 2026-05-03
---
# GPU Device Node Mismatch in LXC Docker — Exit Code 255
When passing a GPU device node (e.g., `/dev/dri/renderD128`) into a Docker container inside a Proxmox LXC, the device node number in the Docker compose bind mount must exactly match what the host exposes. If the config references `/dev/dri/renderD129` but the host only has `/dev/dri/renderD128`, Docker creates an empty regular file at the mount point instead of a character device — and the container exits with code 255 silently, with no meaningful error message.
## Key Points
- **Device node numbers are not arbitrary** — `/dev/dri/renderD128` and `/dev/dri/renderD129` are different device nodes; Docker does not auto-fallback to the available one
- **Bind mount creates empty file** when the source path doesn't exist as a character device — the container sees a regular file where a GPU device is expected
- **Exit code 255** with no error message is the symptom — check if the GPU device path in compose matches the actual host device
- **Verify with `ls -la /dev/dri/`** on the host (Proxmox LXC or Proxmox host, depending on your passthrough setup) before writing the compose file
- LXC containers need GPU device access explicitly enabled in the container config (`lxc.cgroup2.devices.allow` and device mount) before Docker inside can use it
## Details
### How to Check Available Device Nodes
```bash
# Check GPU device nodes available in the LXC container
ls -la /dev/dri/
# crw-rw---- 1 root video 226, 0 May 3 17:12 card0
# crw-rw---- 1 root render 226, 128 May 3 17:12 renderD128
# ← renderD128 exists, renderD129 does NOT exist
```
### The Failing Docker Compose
```yaml
# ❌ WRONG — renderD129 doesn't exist; creates empty file
services:
jellyfin:
devices:
- /dev/dri/renderD129:/dev/dri/renderD129
```
### The Correct Config
```yaml
# ✅ CORRECT — matches the actual device node
services:
jellyfin:
group_add:
- "render"
- "video"
devices:
- /dev/dri/renderD128:/dev/dri/renderD128
- /dev/dri/card0:/dev/dri/card0
```
### Diagnosing Exit Code 255
```bash
# Check container logs
docker compose logs jellyfin
# Check if the device is actually a character device inside the container
docker compose exec jellyfin ls -la /dev/dri/
# If you see: -rw-r--r-- (regular file) instead of crw (char device) → bind mount failed
# Check what device nodes the LXC actually has
ls -la /dev/dri/
# Verify the LXC has GPU passthrough configured in Proxmox
# (on Proxmox host)
pct config <CTID> | grep dev
```
### LXC Configuration for GPU Passthrough
For the LXC to expose GPU devices to Docker, the container config needs:
```
# /etc/pve/lxc/<CTID>.conf
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
```
After editing, restart the container: `pct restart <CTID>`.
## Related Concepts
- [[wiki/concepts/proxmox-container-502-misdiagnosis]] — another case where diagnosing the wrong layer costs time
- [[wiki/concepts/docker-compose-restart-no-code-reload]] — Docker container debugging patterns
- [[wiki/homelab/_index]] — Proxmox and LXC infrastructure context
## Sources
- [[daily/2026-05-03.md]] — Jellyfin CT111 down with exit code 255; root cause was LXC config referencing `/dev/dri/renderD129` but host only has `/dev/dri/renderD128`; bind mount created empty file instead of char device; fixed by correcting device node number