obsidian/wiki/infrastructure/_index.md
2026-05-03 17:58:02 +01:00

5.9 KiB
Raw Blame History

tags updated
infrastructure
index
2026-04-27

Infrastructure — Index

Server inventory for all SSH-accessible machines. Last audited: 2026-04-24. Update this section whenever you SSH in and notice changes.

Oliver Agency Servers (GCP)

Article Server IP Role
wiki/infrastructure/server-optical optical-web-1 10.220.168.5 Main AI prod — 35+ apps, systemd
wiki/infrastructure/server-optical-dev optical-dev 10.220.168.9 Docker staging — ppt-tool, cc-dashboard, semblance, 15+ apps
wiki/infrastructure/server-optical-prod optical-prod 10.220.168.8 Minimal / secondary prod
wiki/infrastructure/server-librechat librechat-dev + prod 10.220.168.2 / .4 LibreChat AI chat platform (both envs)
wiki/infrastructure/server-modocmms modcomms-01 10.220.168.6 ModoCMMS staging + prod (Apache)
wiki/infrastructure/server-baic web-03 10.220.72.13 Main web host — 40+ domains, oliver.agency
wiki/infrastructure/server-box-cli box-cli-01 10.220.176.3 Ford/L'Oréal hotfolder, CentOS 7, 1TB NFS

Personal / Aimpress

Article Server IP Role
wiki/infrastructure/server-aimpress c2-15-uk1 57.128.160.249 Aimpress VPS — Mailcow, n8n, Traefik
wiki/infrastructure/server-pve pve 192.168.1.48 Proxmox homelab — 8 containers + Kali VM

Quick Reference

Article Purpose
wiki/infrastructure/ssh-aliases All aliases, IPs, keys, health-check one-liner
wiki/infrastructure/network-topology Internet→router→NPM→services flow, LAN subnet map, DNS paths, Tailscale overlay

⚠ Known Issues

Add date when you discover an issue. Move to Resolved when fixed, then delete after 2 weeks.

🔴 Critical

  • optical 2026-04-24DISK 99% FULL — 5.9 GB free on 533 GB. Top offenders: /opt/ferrero-opentext 12 GB, /opt/backups 8.9 GB, /opt/sandbox-notebookllamalm-nextjs 8.5 GB — action needed
  • optical 2026-04-24SSL cert expires May 8 2026 — ai-sandbox.oliver.solutions — renew before May 8
  • optical 2026-04-24notebookllama-backend.service FAILED — crashed, taking 8.5 GB disk

🟠 Security

  • optical 2026-04-24 — All databases bound to 0.0.0.0: Redis ×3 (:6379/:6380/:6399), PostgreSQL ×3 (:5432/:5433/:5437), MongoDB ×3 (:27017/:27019/:27021), Neo4j (:7474/:7475/:7687/:7688)
  • librechat-prod 2026-04-24 — MongoDB :27017 on 0.0.0.0 — publicly exposed, no auth config found
  • baic 2026-04-24 — PostgreSQL :5432 + rpcbind :111 on 0.0.0.0
  • optical-dev 2026-04-24 — PostgreSQL :5436/:5491/:5493 + olivas :8000 + cc-dashboard :8800 on 0.0.0.0
  • baic 2026-04-21 — Grafana default admin:admin password unchanged

🟡 Capacity

  • librechat-prod 2026-04-24 — data directory 197 GB (484 GB total, 65%) — monitor growth
  • pve usb-backup 2026-05-0337.58% (345GB/916GB) — was 12% — growing fast, check vzdump retention
  • pve vm-102-disk-0 2026-05-03 — thin-pool 99.39% allocated — run fstrim in CT102 (df shows 36% — not urgent but should be cleaned)
  • aimpress 2026-04-24 — 26.58 GB reclaimable Docker images (docker image prune -a)
  • baic 2026-04-24 — large vhosts: ustudio.global 22 GB, ustudiostaging2 19 GB, ie.oliver.agency 13 GB

🟠 Security

  • optical 2026-04-24 — All databases bound to 0.0.0.0: Redis ×3 (:6379/:6380/:6399), PostgreSQL ×3 (:5432/:5433/:5437), MongoDB ×3 (:27017/:27019/:27021), Neo4j (:7474/:7475/:7687/:7688)
  • librechat-prod 2026-04-24 — MongoDB :27017 on 0.0.0.0 — publicly exposed, no auth config found
  • baic 2026-04-24 — PostgreSQL :5432 + rpcbind :111 on 0.0.0.0
  • optical-dev 2026-04-24 — PostgreSQL :5436/:5491/:5493 + olivas :8000 + cc-dashboard :8800 on 0.0.0.0
  • baic 2026-04-21 — Grafana default admin:admin password unchanged
  • pve CT102 2026-05-03docker-socket-proxy on 0.0.0.0:2376 — Docker API accessible on LAN (should be 127.0.0.1)

🔵 Maintenance

  • optical-dev 2026-04-24 — hp-prod-tracker + dow-prod-tracker containers unhealthy (healthcheck misconfigured, apps running fine)
  • box-cli 2026-04-24 — CentOS 7 EOL since Jun 2024 — needs OS migration
  • pve CT105 2026-05-03Immich STOPPED — fix: pct set 105 --delete dev1 && pct set 105 --delete dev2 && pct start 105
  • pve CT101 2026-05-03Legacy AdGuard still running — router DHCP DNS still points to 192.168.1.62, needs update to 192.168.1.225
  • pve CT102 2026-05-03Stirling-PDF broken — OIDC points to deleted Authentik — fix: set SECURITY_OAUTH2_ENABLED=false
  • pve CT102 2026-05-03Loki without Promtail — logs not flowing
  • pve CT102 2026-05-03CrowdSec without bouncer — IPS observing but not blocking
  • pve CT102 2026-05-035 dead NPM proxy hosts — id=5,6,8,12 (delete), id=10 (change to CT102 AdGuard :8053)
  • pve host 2026-05-03 — rpcbind :111 open on 0.0.0.0 — disable if no NFS: systemctl disable --now rpcbind rpcbind.socket
  • pve 2026-05-03 — Tailscale no subnet-router — LAN not accessible remotely without port forwarding

Resolved

  • pve local-lvm 2026-05-03 — improved to 58.85% (was 71%) — old stale LXCs (CT103/104/107/109/110) destroyed
  • pve CT 102 (docker) — resolved 2026-04-24 — Docker data-root moved to /mnt/data/docker, now 51%
  • pve CT 105 (immich) — resolved 2026-04-24 — PostgreSQL + cache moved to data-hdd, now 62%
  • pve — resolved 2026-04-24 — Proxmox security updates applied (libngtcp2, cluster libs)
  • optical 2026-04-24 — SSL cert ai-sandbox.oliver.solutions — track separately (check if renewed)