diff --git a/.obsidian/plugins/hoarder-sync/data.json b/.obsidian/plugins/hoarder-sync/data.json index 69f1e66..3daae9f 100644 --- a/.obsidian/plugins/hoarder-sync/data.json +++ b/.obsidian/plugins/hoarder-sync/data.json @@ -4,7 +4,7 @@ "syncFolder": "Hoarder", "attachmentsFolder": "Hoarder/attachments", "syncIntervalMinutes": 60, - "lastSyncTimestamp": 1777555641238, + "lastSyncTimestamp": 1777556035137, "updateExistingFiles": false, "excludeArchived": true, "onlyFavorites": false, diff --git a/99 Daily/2026-04-30.md b/99 Daily/2026-04-30.md index c448c5f..70279da 100644 --- a/99 Daily/2026-04-30.md +++ b/99 Daily/2026-04-30.md @@ -149,3 +149,48 @@ tags: [daily] - 14:27 (<1min) | `memory-compiler` - **Asked:** Compile a new article about LM Studio embeddings into the structured wiki knowledge base. - **Done:** Filed article as `wiki/claude-code/lmstudio-embeddings.md` and updated master index with wikilinks to related LM Studio topics and RAG pattern. +- 14:28 (1min) | `memory-compiler` + - **Asked:** Compile a new HP EliteDesk 800G3 teardown/upgrade article into the wiki knowledge base. + - **Done:** Filed article as `wiki/homelab/hp-elitedesk-800g3-teardown-upgrade.md` with full disassembly procedures, motherboard specs, and upgrade benchmarks. +- 14:29 | `video-accessibility` + - **Asked:** Asked for code review skills checklist from project instructions | Reviewed project completion and committed code changes | No files specified + - **Done:** — +- 14:30 (<1min) | `memory-compiler` + - **Asked:** Compile a new article on LM Studio messages API into the wiki knowledge base. + - **Done:** Created structured wiki article with cURL examples and updated topic and master indices. +- 14:31 (<1min) | `memory-compiler` + - **Asked:** Compile a new article about LM Studio's OpenAI-compatible endpoints into the wiki knowledge base. + - **Done:** Created the article, updated the claude-code index to 21 articles, and bumped the master index count. +- 14:32 | `video-accessibility` + - **Asked:** What skills should be checked for code review according to the instructions? + - **Done:** Reviewed project completion and identified environment configuration changes needed for optical-dev deployment. +- 14:33 (<1min) | `memory-compiler` + - **Asked:** Compile a new article about LM Studio headless service into the knowledge base wiki. + - **Done:** Filed article as `claude-code/lmstudio-headless-service.md` and updated master index to reflect 23 total claude-code articles. +- 14:33 (<1min) | `memory-compiler` + - **Asked:** Compile a new article about LM Studio network serving into the knowledge base and update the master index. + - **Done:** Created new LM Studio article, updated claude-code topic index, and incremented master index article count from 23 to 24. +- 14:34 | `video-accessibility` + - **Asked:** Check the project instructions for code review skills requirements. + - **Done:** Identified OOM issue in whisper-worker memory configuration and pushed hotfix to restore original memory limits while keeping Cloud Run URLs. +- 14:35 (<1min) | `memory-compiler` + - **Asked:** Compile a raw article about LM Studio systemd configuration into the structured wiki knowledge base. + - **Done:** Filed the article as a systemd unit configuration guide with systemd service setup details, unit file ordering, and PATH requirements. +- 14:36 (<1min) | `memory-compiler` + - **Asked:** File a new article about LM Studio structured output into the knowledge base. + - **Done:** Created wiki article and updated both index files to register the new entry. +- 14:37 (1min) | `memory-compiler` + - **Asked:** Compile a new article on tool use into the knowledge base wiki structure. + - **Done:** Processed raw article into `claude-code/lmstudio-tool-use.md` and updated both topic and master indexes. +- 14:38 | `video-accessibility` + - **Asked:** Check the instructions for code review skills to verify the completed project. + - **Done:** Reviewed deployment fix that restored memory limits and confirmed all 7 containers started successfully with API health checks passing. +- 14:39 (<1min) | `memory-compiler` + - **Asked:** Compile a new LM Studio CLI article into the knowledge base wiki. + - **Done:** Created structured wiki article with command reference and cross-links, updated master index from 29 to 30 claude-code articles. +- 14:41 | `video-accessibility` + - **Asked:** Check the project instructions for code review skills that need to be verified. + - **Done:** Reviewed deployment status and identified CORS configuration and ffmpeg logging checks needed. +- 14:41 | `video-accessibility` + - **Asked:** Check project completion and review code quality assessment skills from instructions. + - **Done:** Identified server authorization limitations and provided gsutil CORS configuration command for local execution. diff --git a/raw/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md b/raw/_processed/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md similarity index 100% rename from raw/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md rename to raw/_processed/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md diff --git a/raw/Idle TTL and Auto-Evict.md b/raw/_processed/Idle TTL and Auto-Evict.md similarity index 100% rename from raw/Idle TTL and Auto-Evict.md rename to raw/_processed/Idle TTL and Auto-Evict.md diff --git a/raw/LM Studio API.md b/raw/_processed/LM Studio API.md similarity index 100% rename from raw/LM Studio API.md rename to raw/_processed/LM Studio API.md diff --git a/raw/Messages.md b/raw/_processed/Messages.md similarity index 100% rename from raw/Messages.md rename to raw/_processed/Messages.md diff --git a/raw/OpenAI Compatibility Endpoints.md b/raw/_processed/OpenAI Compatibility Endpoints.md similarity index 100% rename from raw/OpenAI Compatibility Endpoints.md rename to raw/_processed/OpenAI Compatibility Endpoints.md diff --git a/raw/Responses.md b/raw/_processed/Responses.md similarity index 100% rename from raw/Responses.md rename to raw/_processed/Responses.md diff --git a/raw/Run LM Studio as a service (headless).md b/raw/_processed/Run LM Studio as a service (headless).md similarity index 100% rename from raw/Run LM Studio as a service (headless).md rename to raw/_processed/Run LM Studio as a service (headless).md diff --git a/raw/Serve on Local Network.md b/raw/_processed/Serve on Local Network.md similarity index 100% rename from raw/Serve on Local Network.md rename to raw/_processed/Serve on Local Network.md diff --git a/raw/Server Settings.md b/raw/_processed/Server Settings.md similarity index 100% rename from raw/Server Settings.md rename to raw/_processed/Server Settings.md diff --git a/raw/Setup llmster as a Startup Task on Linux.md b/raw/_processed/Setup llmster as a Startup Task on Linux.md similarity index 100% rename from raw/Setup llmster as a Startup Task on Linux.md rename to raw/_processed/Setup llmster as a Startup Task on Linux.md diff --git a/raw/Structured Output.md b/raw/_processed/Structured Output.md similarity index 100% rename from raw/Structured Output.md rename to raw/_processed/Structured Output.md diff --git a/raw/Tool Use.md b/raw/_processed/Tool Use.md similarity index 100% rename from raw/Tool Use.md rename to raw/_processed/Tool Use.md diff --git a/raw/Using MCP via API.md b/raw/_processed/Using MCP via API.md similarity index 100% rename from raw/Using MCP via API.md rename to raw/_processed/Using MCP via API.md diff --git a/raw/lms — LM Studio's CLI.md b/raw/_processed/lms — LM Studio's CLI.md similarity index 100% rename from raw/lms — LM Studio's CLI.md rename to raw/_processed/lms — LM Studio's CLI.md diff --git a/wiki/_master-index.md b/wiki/_master-index.md index a014493..b618f0b 100644 --- a/wiki/_master-index.md +++ b/wiki/_master-index.md @@ -26,12 +26,12 @@ This 3-hop pattern works for hundreds of articles without vector search. | [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 75 | | [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder, Docker DNS+AdGuard | 9 | | [[wiki/qa/_index\|qa/]] | Filed answers to queries (saved with `--file-back`) | 0 | -| [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 39 | +| [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 40 | | [[wiki/web-agency/_index\|web-agency/]] | AI-assisted website building & selling: Claude Code, Nanobanana 2, Kling, LaunchPath MCP | 9 | | [[wiki/dotfiles/_index\|dotfiles/]] | Linux terminal ricing: Kitty, Fish, WezTerm CLI, modern Rust CLI tools, LazyVim, unified themes, Tabby | 21 | | [[wiki/agent-sdk/_index\|agent-sdk/]] | Claude Agent SDK (formerly Claude Code SDK) — build autonomous AI agents in Python and TypeScript | 30 | | [[wiki/llm-models/_index\|llm-models/]] | LLM model catalogs — OpenAI and Claude/Anthropic models, IDs, context, pricing | 2 | -| [[wiki/claude-code/_index\|claude-code/]] | Claude Code product docs — install, capabilities, surfaces, MCP, hooks, scheduling, multi-agent, plugins, skills, channels, error recovery, LM Studio local | 17 | +| [[wiki/claude-code/_index\|claude-code/]] | Claude Code product docs — install, capabilities, surfaces, MCP, hooks, scheduling, multi-agent, plugins, skills, channels, error recovery, LM Studio local | 30 | | [[wiki/reports/_index\|reports/]] | Weekly and monthly summaries — generate: `uv run python scripts/report-generator.py --weekly` | 1 | | [[wiki/infrastructure/_index\|infrastructure/]] | Server inventory: all 10 SSH hosts — optical, optical-dev, optical-prod, baic, librechat, modocmms, box-cli, aimpress, pve | 10 | diff --git a/wiki/claude-code/_index.md b/wiki/claude-code/_index.md index 3748b8c..d7a8dec 100644 --- a/wiki/claude-code/_index.md +++ b/wiki/claude-code/_index.md @@ -31,3 +31,16 @@ Claude Code is Anthropic's agentic coding assistant. Works across terminal, IDE, | [[wiki/claude-code/lmstudio-anthropic-compat\|lmstudio-anthropic-compat]] | Redirect Claude Code and the Anthropic SDK to a local LM Studio server via two env vars; `/v1/messages` drop-in, auth options, cURL + Python examples | raw/Anthropic Compatibility Endpoints.md | 2026-04-30 | | [[wiki/claude-code/lmstudio-chat-completions\|lmstudio-chat-completions]] | LM Studio OpenAI-compatible `/v1/chat/completions`: Python example, all supported params (incl. top_k, repeat_penalty), `lms log stream` debugging | raw/Chat Completions.md | 2026-04-30 | | [[wiki/claude-code/lmstudio-embeddings\|lmstudio-embeddings]] | LM Studio `/v1/embeddings`: OpenAI-compat drop-in, Python example, newline stripping, batch inputs, use with FAISS/Chroma for local RAG | raw/Embeddings.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-idle-ttl-auto-evict\|lmstudio-idle-ttl-auto-evict]] | Idle TTL (per-request `ttl` field, `lms load --ttl`) and Auto-Evict (1 JIT model at a time) for LM Studio memory management | raw/Idle TTL and Auto-Evict.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-rest-api\|lmstudio-rest-api]] | LM Studio native v1 REST API: all endpoints, endpoint feature comparison (native vs OAI vs Anthropic compat), model lifecycle management | raw/LM Studio API.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-messages-api\|lmstudio-messages-api]] | LM Studio `/v1/messages` drop-in: basic, streaming (SSE events), and tool-use cURL examples; auth options | raw/Messages.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-openai-compat-endpoints\|lmstudio-openai-compat-endpoints]] | LM Studio OpenAI-compat overview: 5 endpoints, base_url swap pattern, Python/TS/cURL examples, Codex support | raw/OpenAI Compatibility Endpoints.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-responses-api\|lmstudio-responses-api]] | LM Studio `/v1/responses`: streaming SSE, stateful follow-up via `previous_response_id`, reasoning effort, Remote MCP tools | raw/Responses.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-headless-service\|lmstudio-headless-service]] | Run LM Studio without GUI: llmster daemon (recommended) or desktop tray mode; JIT model loading and auto-evict | raw/Run LM Studio as a service (headless).md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-serve-on-network\|lmstudio-serve-on-network]] | Bind LM Studio server to LAN IP so other devices (thin clients, IoT, team members) can call the API over the local network | raw/Serve on Local Network.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-server-settings\|lmstudio-server-settings]] | All LM Studio API server toggles: port, auth, CORS, LAN access, per-request MCPs, mcp.json access, JIT loading + auto-evict | raw/Server Settings.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-llmster-systemd\|lmstudio-llmster-systemd]] | systemd unit file for llmster: install daemon, load model at boot, ExecStartPre ordering, oneshot+RemainAfterExit pattern, service management commands | raw/Setup llmster as a Startup Task on Linux.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-structured-output\|lmstudio-structured-output]] | Enforce JSON schema on LLM responses via response_format; GGUF uses llama.cpp grammar, MLX uses Outlines; models <7B often unsupported | raw/Structured Output.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-tool-use\|lmstudio-tool-use]] | LM Studio function calling: tool definition format, multi-turn flow, native vs default support, streaming accumulation, Python examples | raw/Tool Use.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-mcp-via-api\|lmstudio-mcp-via-api]] | MCP servers via LM Studio `/api/v1/chat`: ephemeral (inline) vs mcp.json (pre-configured), allowed_tools, custom auth headers | raw/Using MCP via API.md | 2026-04-30 | +| [[wiki/claude-code/lmstudio-lms-cli\|lmstudio-lms-cli]] | `lms` CLI: model download/load/unload/list, server start/stop, log streaming, GPU offload flags, --identifier alias, daemon management | raw/lms — LM Studio's CLI.md | 2026-04-30 | diff --git a/wiki/claude-code/lmstudio-headless-service.md b/wiki/claude-code/lmstudio-headless-service.md new file mode 100644 index 0000000..2d94a19 --- /dev/null +++ b/wiki/claude-code/lmstudio-headless-service.md @@ -0,0 +1,104 @@ +--- +title: "LM Studio Headless / Service Mode" +aliases: [lmstudio-daemon, llmster, lmstudio-background-service] +tags: [lmstudio, local-llm, headless, daemon, jit-loading] +sources: [raw/Run LM Studio as a service (headless).md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio Headless / Service Mode + +GUI-less operation of LM Studio: run as a background daemon, start on machine login, and load models on demand via JIT. + +## Two Approaches + +| Approach | Best For | GUI Required? | +|----------|----------|---------------| +| **llmster** (recommended) | Linux servers, cloud, GPU rigs, headless machines | No | +| **Desktop app headless mode** | Machines with a GUI where app is already installed | Yes (hidden to tray) | + +--- + +## Option 1: llmster (Recommended) + +`llmster` is the core of the LM Studio desktop app, repackaged as a server-native daemon. No GUI dependency. + +### Install + +```bash +# Linux / Mac +curl -fsSL https://lmstudio.ai/install.sh | bash + +# Windows (PowerShell) +irm https://lmstudio.ai/install.ps1 | iex +``` + +### Start the daemon + +```bash +lms daemon up +``` + +- To auto-start on Linux boot, configure it as a **Linux Startup Task** (see LM Studio docs). +- Full CLI reference: `lms daemon --help` + +--- + +## Option 2: Desktop App in Headless Mode + +Works on Mac, Windows, Linux (with GUI). Useful if the desktop app is already installed. + +### Run server on login + +1. Open app settings (`Cmd/Ctrl` + `,`) +2. Enable **"Run LLM server on login"** +3. Exiting the app minimizes to tray — server keeps running + +### Start server programmatically + +```bash +lms server start +``` + +Last server state is saved and restored automatically on launch. + +--- + +## Just-In-Time (JIT) Model Loading + +Applies to **both** options. Useful when using LM Studio as a backend for other tools (Open WebUI, Claude Code, custom apps). + +| JIT State | `/v1/models` returns | Inference behavior | +|-----------|---------------------|--------------------| +| **ON** | All downloaded models | Auto-loads model into VRAM on first call | +| **OFF** | Only models in VRAM | Must manually load model first | + +### Auto-Unload + +JIT-loaded models are **auto-evicted** after a period of inactivity — see [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] for TTL settings and per-request `ttl` field. + +--- + +## Key Takeaways + +- **llmster** is the preferred headless path — works on servers and CI without any GUI +- Desktop headless mode is a quick option for developer machines already running the app +- JIT loading eliminates manual `lms load` calls; models are loaded on first inference request +- JIT-loaded models auto-unload after inactivity (configurable TTL) +- Use `lms server start` to programmatically control the REST server state +- The OpenAI-compatible REST API (`/v1/...`) is available in both modes — see [[wiki/claude-code/lmstudio-openai-compat-endpoints|OpenAI Compat Endpoints]] and [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] + +--- + +## Related + +- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — all endpoints and lifecycle management +- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management for JIT-loaded models +- [[wiki/claude-code/lmstudio-openai-compat-endpoints|OpenAI Compat Endpoints]] — drop-in base_url swap for any OpenAI client +- [[wiki/claude-code/lmstudio-anthropic-compat|Anthropic Compat Endpoints]] — redirect Claude Code / Anthropic SDK to local LM Studio + +## Sources + +- `raw/Run LM Studio as a service (headless).md` +- LM Studio docs: https://lmstudio.ai/docs/developer/core/headless diff --git a/wiki/claude-code/lmstudio-idle-ttl-auto-evict.md b/wiki/claude-code/lmstudio-idle-ttl-auto-evict.md new file mode 100644 index 0000000..56fbac3 --- /dev/null +++ b/wiki/claude-code/lmstudio-idle-ttl-auto-evict.md @@ -0,0 +1,90 @@ +--- +title: "LM Studio — Idle TTL and Auto-Evict" +aliases: [lmstudio-ttl, lmstudio-auto-evict, idle-ttl] +tags: [lmstudio, memory-management, jit-loading, ttl, api] +sources: [raw/Idle TTL and Auto-Evict.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio — Idle TTL and Auto-Evict + +Memory management features for LM Studio's JIT-loaded models. Prevents idle models from occupying VRAM and enables seamless model switching from external apps. + +## Background + +| Feature | Default | Purpose | +|---------|---------|---------| +| **JIT Loading** | enabled | Loads model on first API request — no manual preload needed | +| **Idle TTL** | 60 min | Unloads a model after it has been idle for N seconds/minutes | +| **Auto-Evict** | enabled | Unloads previous JIT model before loading a new one | + +## Idle TTL + +**Problem:** JIT-loaded models stay in VRAM even when idle (e.g. after you stop using Cline, Zed, or Continue.dev). + +**Solution:** TTL starts a countdown when the model goes idle. The timer resets on every new request. When it expires, the model unloads automatically. + +### Setting TTL + +**App-wide default** — configure in Developer tab → Server Settings. + +**Per-request (API)** — pass `ttl` in seconds in the request body: + +```bash +curl http://localhost:1234/api/v0/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "deepseek-r1-distill-qwen-7b", + "ttl": 300, + "messages": [...] + }' +``` + +Works on both the OpenAI-compat (`/v1/`) and LM Studio REST (`/api/v0/`) endpoints. + +**`lms` CLI** — set TTL at load time: + +```bash +lms load --ttl 3600 # 1 hour +``` + +Models loaded with `lms load` have **no TTL by default** (persist until manual unload). + +**Server tab** — TTL field visible when loading a model through the GUI. + +## Auto-Evict + +Controls how many JIT-loaded models can coexist in memory. + +| State | Behaviour | +|-------|-----------| +| **ON** (default) | At most 1 JIT model in memory at a time; old model evicted before new one loads | +| **OFF** | Models accumulate in memory; only unloaded by TTL expiry or manual action | + +- Non-JIT (manually loaded) models are **never** affected by Auto-Evict. +- Toggle in: Developer tab → Server Settings. + +## TTL + Auto-Evict Together + +- **Auto-Evict** handles immediate switching — keeps 1 active model. +- **TTL** handles the "forgot to switch" case — cleans up if you just stop using an app. +- Both can be active simultaneously for full memory hygiene. + +## Key Takeaways + +- Set `"ttl": 300` in any API request to cap a model's idle lifetime to 5 minutes. +- `lms load --ttl 3600` is the CLI equivalent for persistent sessions. +- Auto-Evict (default ON) ensures only 1 JIT model lives in VRAM at a time — great for low-VRAM machines. +- `lms load` bypasses TTL defaults; always pass `--ttl` explicitly if you want auto-cleanup. +- These features are irrelevant for models loaded via the GUI Models tab (non-JIT path). + +## Related + +- [[wiki/claude-code/lmstudio-anthropic-compat|LM Studio Anthropic Compat]] — redirect Claude Code to local LM Studio +- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — full parameter reference incl. `top_k`, `repeat_penalty` +- [[wiki/claude-code/lmstudio-embeddings|LM Studio Embeddings]] — local RAG with FAISS/Chroma + +## Sources + +- [LM Studio Docs — Idle TTL and Auto-Evict](https://lmstudio.ai/docs/developer/core/ttl-and-auto-evict) diff --git a/wiki/claude-code/lmstudio-llmster-systemd.md b/wiki/claude-code/lmstudio-llmster-systemd.md new file mode 100644 index 0000000..0e37c19 --- /dev/null +++ b/wiki/claude-code/lmstudio-llmster-systemd.md @@ -0,0 +1,96 @@ +--- +title: "LM Studio — llmster Startup Service (systemd)" +aliases: [llmster-systemd, lmstudio-startup, lmstudio-daemon-linux] +tags: [lmstudio, llmster, systemd, linux, headless, local-llm] +sources: [raw/Setup llmster as a Startup Task on Linux.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio — llmster Startup Service (systemd) + +Configure `llmster` (LM Studio's headless daemon) to launch automatically at boot, load a model, and start the HTTP API server — all via a systemd unit file. + +## Install llmster + +```bash +curl -fsSL https://lmstudio.ai/install.sh | bash +lms --help # verify +``` + +## Download a Model + +```bash +lms get openai/gpt-oss-20b +# note the model path printed — used in service config +``` + +## Manual Test (before systemd) + +```bash +lms load openai/gpt-oss-20b +lms server start +curl http://localhost:1234/v1/models # should return model list +lms server stop +``` + +## systemd Unit File + +Create `/etc/systemd/system/lmstudio.service` (replace `YOUR_USERNAME`): + +```ini +[Unit] +Description=LM Studio Server + +[Service] +Type=oneshot +RemainAfterExit=yes +User=YOUR_USERNAME +Environment="HOME=/home/YOUR_USERNAME" +ExecStartPre=/home/YOUR_USERNAME/.lmstudio/bin/lms daemon up +ExecStartPre=/home/YOUR_USERNAME/.lmstudio/bin/lms load openai/gpt-oss-20b --yes +ExecStart=/home/YOUR_USERNAME/.lmstudio/bin/lms server start +ExecStop=/home/YOUR_USERNAME/.lmstudio/bin/lms daemon down + +[Install] +WantedBy=multi-user.target +``` + +- `Type=oneshot` + `RemainAfterExit=yes` — service is considered "active" after `ExecStart` exits +- `ExecStartPre` runs sequentially before `ExecStart` +- Skip the `lms load` line to rely on [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|JIT loading + auto-evict]] instead + +## Enable and Start + +```bash +sudo systemctl daemon-reload +sudo systemctl enable lmstudio.service +sudo systemctl start lmstudio.service +``` + +## Verify + +```bash +systemctl status lmstudio +curl http://localhost:1234/v1/models +``` + +## Service Management + +```bash +sudo systemctl stop lmstudio # stop +sudo systemctl restart lmstudio # restart +sudo systemctl disable lmstudio # remove from boot +``` + +## Key Takeaways + +- Use `lms daemon up` in `ExecStartPre` — the daemon must be running before `lms load` or `lms server start` +- Binary path is `~/.lmstudio/bin/lms` — use the absolute path in the unit file (systemd has a minimal `$PATH`) +- `Type=oneshot` + `RemainAfterExit=yes` keeps the service "active" so `ExecStop` runs on shutdown +- Omit the `lms load` step and use [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|JIT loading]] to avoid pinning a model at boot +- API is served on `http://localhost:1234` — see [[wiki/claude-code/lmstudio-headless-service|headless service overview]] for non-systemd options and [[wiki/claude-code/lmstudio-serve-on-network|LAN serving]] to expose to other devices + +## Sources + +- [LM Studio Headless llmster Docs](https://lmstudio.ai/docs/developer/core/headless_llmster) diff --git a/wiki/claude-code/lmstudio-lms-cli.md b/wiki/claude-code/lmstudio-lms-cli.md new file mode 100644 index 0000000..9fb4af9 --- /dev/null +++ b/wiki/claude-code/lmstudio-lms-cli.md @@ -0,0 +1,108 @@ +--- +title: "lms — LM Studio CLI" +aliases: [lms-cli, lmstudio-cli] +tags: [lmstudio, cli, local-llm, inference, server] +sources: [raw/lms — LM Studio's CLI.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# lms — LM Studio CLI + +`lms` is LM Studio's built-in CLI utility for managing models, the inference server, and the runtime. Ships with LM Studio — no separate install needed. MIT licensed, open source on GitHub. + +## Installation & Verification + +```bash +# Already installed with LM Studio — just verify: +lms --help +``` + +Current version: `v0.0.47` + +## Command Reference + +| Command | What it does | +|---------|-------------| +| `lms chat` | Start interactive chat with a model in the terminal | +| `lms get` | Search and download models | +| `lms ls` | List models available on disk | +| `lms ps` | List models currently loaded in memory | +| `lms load` | Load a model (with GPU/context options) | +| `lms unload` | Unload a model | +| `lms import` | Import a model file into LM Studio | +| `lms server start/stop` | Control the local API server | +| `lms log` | Stream incoming/outgoing messages for debugging | +| `lms runtime` | Manage and update the inference runtime | +| `lms daemon` | Manage the headless llmster daemon | +| `lms link` | Manage LM Link | +| `lms clone` | Clone an artifact from LM Studio Hub | +| `lms push` | Upload artifact to LM Studio Hub | +| `lms login` | Authenticate with LM Studio | + +## Common Workflows + +### Server control + +```bash +lms server start +lms server stop +``` + +### List & inspect models + +```bash +lms ls # models on disk (reflects My Models directory) +lms ps # models currently loaded in memory +``` + +### Load a model + +```bash +# With GPU offload and context size: +lms load [--gpu=max|auto|0.0-1.0] [--context-length=1-N] + +# --gpu=1.0 → 100% GPU offload +# With a stable identifier alias: +lms load openai/gpt-oss-20b --identifier="my-model-name" +``` + +Using `--identifier` keeps the model ID stable across loads — useful when client code hardcodes a model name. + +### Unload a model + +```bash +lms unload # unload specific model +lms unload --all # unload everything +``` + +### Debug message flow + +```bash +lms log stream # tail all incoming/outgoing API messages live +``` + +Pairs with [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] for debugging request/response cycles. + +## Key Takeaways + +- `lms` ships with LM Studio — zero extra install steps +- `lms ps` vs `lms ls`: loaded-in-memory vs on-disk — two different commands +- `--gpu=1.0` forces full GPU offload; `--gpu=auto` lets LM Studio decide +- `--identifier` flag on `lms load` decouples client model names from actual model paths +- `lms log stream` is the fastest way to debug what's hitting the server +- `lms daemon` manages [[wiki/claude-code/lmstudio-headless-service|llmster]] for headless/service deployments +- MIT licensed: safe to embed in scripts and automation + +## Related Articles + +- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — all API endpoints +- [[wiki/claude-code/lmstudio-headless-service|Headless Service (llmster)]] — daemon mode for servers +- [[wiki/claude-code/lmstudio-server-settings|Server Settings]] — port, auth, CORS, JIT loading +- [[wiki/claude-code/lmstudio-chat-completions|Chat Completions]] — OpenAI-compat `/v1/chat/completions` +- [[wiki/claude-code/lmstudio-llmster-systemd|llmster systemd unit]] — run llmster at boot on Linux +- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management + +## Sources + +- lmstudio.ai/docs/cli diff --git a/wiki/claude-code/lmstudio-mcp-via-api.md b/wiki/claude-code/lmstudio-mcp-via-api.md new file mode 100644 index 0000000..9d4d7fd --- /dev/null +++ b/wiki/claude-code/lmstudio-mcp-via-api.md @@ -0,0 +1,115 @@ +--- +title: "LM Studio — MCP via API" +aliases: [lmstudio-mcp-api, mcp-lmstudio, lm-studio-mcp] +tags: [lmstudio, mcp, api, tool-use, integration] +sources: [raw/Using MCP via API.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio — MCP via API + +Requires LM Studio 0.4.0+. MCP servers provide tools that models can call during chat requests via `/api/v1/chat`. + +## Two Server Modes + +| Feature | Ephemeral | mcp.json | +|---------|-----------|----------| +| Specified via | `integrations` → `"type": "ephemeral_mcp"` | `integrations` → `"type": "plugin"` | +| Config | Per-request only | Pre-configured in `mcp.json` | +| Use case | One-off / remote tools | Frequent use, tools needing `command` (local processes) | +| Server ID | `server_label` in integration | `id` (e.g. `mcp/playwright`) | +| Custom headers | `headers` field | Configured in `mcp.json` | + +## Ephemeral MCP Servers + +Defined inline per-request — no pre-configuration needed. + +```bash +curl http://localhost:1234/api/v1/chat \ + -H "Authorization: Bearer $LM_API_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "ibm/granite-4-micro", + "input": "What is the top trending model on hugging face?", + "integrations": [ + { + "type": "ephemeral_mcp", + "server_label": "huggingface", + "server_url": "https://huggingface.co/mcp", + "allowed_tools": ["model_search"] + } + ], + "context_length": 8000 + }' +``` + +Response output contains typed entries: `reasoning`, `message`, and `tool_call` objects. Each `tool_call` includes the tool name, arguments, output, and `provider_info` identifying the server. + +## mcp.json Pre-configured Servers + +Recommended for servers that run local commands (e.g. `microsoft/playwright-mcp`) or are used frequently. + +```bash +curl http://localhost:1234/api/v1/chat \ + -H "Authorization: Bearer $LM_API_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "ibm/granite-4-micro", + "input": "Open lmstudio.ai", + "integrations": ["mcp/playwright"], + "context_length": 8000, + "temperature": 0 + }' +``` + +- `integrations` can be a plain string array when referencing pre-configured servers +- `provider_info.type` will be `"plugin"` (vs `"ephemeral_mcp"` for inline) + +## Restricting Tool Access + +Use `allowed_tools` on either integration type: + +```json +"allowed_tools": ["model_search"] +``` + +- Limits which tools the model can call from that server +- Speeds up prompt processing — fewer tool definitions in context +- If omitted, all server tools are available + +## Custom Headers (Ephemeral) + +For authenticated remote MCP endpoints: + +```json +{ + "type": "ephemeral_mcp", + "server_label": "huggingface", + "server_url": "https://huggingface.co/mcp", + "allowed_tools": ["model_search"], + "headers": { + "Authorization": "Bearer " + } +} +``` + +## Key Takeaways + +- LM Studio exposes MCP tool calling through its native `/api/v1/chat` endpoint (not the OpenAI-compat route) +- Two modes: **ephemeral** (inline, per-request) vs **mcp.json** (pre-configured, recommended for local/frequent servers) +- `allowed_tools` works on both modes — use it to reduce context size and restrict scope +- Tool call results appear inline in the `output` array alongside `reasoning` and `message` entries +- Auth headers for remote MCP servers go in the `headers` field on ephemeral integrations +- The [[wiki/claude-code/lmstudio-responses-api|Responses API]] also supports Remote MCP via `tools` — different endpoint, same concept + +## Related + +- [[wiki/claude-code/lmstudio-responses-api|LM Studio Responses API]] — `/v1/responses` endpoint also supports Remote MCP tools +- [[wiki/claude-code/lmstudio-tool-use|LM Studio Tool Use]] — function calling (non-MCP) patterns +- [[wiki/claude-code/lmstudio-server-settings|LM Studio Server Settings]] — toggle per-request MCPs and mcp.json access in the UI +- [[wiki/claude-code/mcp-integration|Claude Code MCP Integration]] — MCP concepts: transports, scopes, OAuth + +## Sources + +- `raw/Using MCP via API.md` — LM Studio docs, 2026-04-30 diff --git a/wiki/claude-code/lmstudio-messages-api.md b/wiki/claude-code/lmstudio-messages-api.md new file mode 100644 index 0000000..5e8da48 --- /dev/null +++ b/wiki/claude-code/lmstudio-messages-api.md @@ -0,0 +1,120 @@ +--- +title: "LM Studio — Anthropic Messages API" +aliases: [lmstudio-messages, lm-studio-anthropic-messages] +tags: [lmstudio, anthropic, api, messages, local-llm, streaming, tools] +sources: [raw/Messages.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio — Anthropic Messages API + +The `/v1/messages` endpoint in LM Studio mirrors the Anthropic Messages API exactly — same request shape, same response shape. Use it as a local drop-in for any code already calling Anthropic's cloud API. + +## Endpoint + +``` +POST http://localhost:1234/v1/messages +``` + +Required headers: +- `Content-Type: application/json` +- `x-api-key: $LM_API_TOKEN` — optional if **Require Authentication** is disabled in LM Studio + +## Basic Request + +```bash +curl http://localhost:1234/v1/messages \ + -H "Content-Type: application/json" \ + -H "x-api-key: $LM_API_TOKEN" \ + -d '{ + "model": "ibm/granite-4-micro", + "max_tokens": 256, + "messages": [ + {"role": "user", "content": "Say hello from LM Studio."} + ] + }' +``` + +## Streaming + +Add `"stream": true` to receive Server-Sent Events (SSE): + +```bash +curl http://localhost:1234/v1/messages \ + -H "Content-Type: application/json" \ + -H "x-api-key: $LM_API_TOKEN" \ + -d '{ + "model": "ibm/granite-4-micro", + "messages": [{"role": "user", "content": "Hello"}], + "max_tokens": 256, + "stream": true + }' +``` + +SSE event sequence: +1. `message_start` +2. `content_block_start` +3. `content_block_delta` (repeating) +4. `content_block_stop` +5. `message_delta` +6. `message_stop` + +## Tool Use + +Pass a `tools` array with JSON Schema input definitions and a `tool_choice` policy: + +```bash +curl http://localhost:1234/v1/messages \ + -H "Content-Type: application/json" \ + -H "x-api-key: $LM_API_TOKEN" \ + -d '{ + "model": "ibm/granite-4-micro", + "max_tokens": 1024, + "tools": [ + { + "name": "get_weather", + "description": "Get the current weather in a given location", + "input_schema": { + "type": "object", + "properties": { + "location": { + "type": "string", + "description": "The city and state, e.g. San Francisco, CA" + } + }, + "required": ["location"] + } + } + ], + "tool_choice": {"type": "any"}, + "messages": [ + {"role": "user", "content": "What is the weather like in San Francisco?"} + ] + }' +``` + +`tool_choice` options (Anthropic-compat): `"auto"`, `"any"`, `{"type": "tool", "name": "…"}`. + +## Authentication + +| Scenario | Header needed | +|----------|---------------| +| Auth disabled in LM Studio | No `x-api-key` required | +| Auth enabled | `x-api-key: $LM_API_TOKEN` | + +## Key Takeaways + +- `POST /v1/messages` on `localhost:1234` is a drop-in for `api.anthropic.com/v1/messages` +- Same request body — swap the base URL and optionally add `x-api-key` +- Streaming uses standard Anthropic SSE event names — existing stream parsers work unchanged +- Tool use with `input_schema` / `tool_choice` is supported +- Auth header is optional when LM Studio's **Require Authentication** is off +- See [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]] for redirecting the full Anthropic SDK via env vars + +## Related + +- [[wiki/claude-code/lmstudio-anthropic-compat|LM Studio Anthropic Compat Setup]] — redirect Claude Code / SDK to local server +- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — OpenAI-compatible `/v1/chat/completions` +- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native v1 endpoints and feature comparison table +- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management for loaded models diff --git a/wiki/claude-code/lmstudio-openai-compat-endpoints.md b/wiki/claude-code/lmstudio-openai-compat-endpoints.md new file mode 100644 index 0000000..4fcdf53 --- /dev/null +++ b/wiki/claude-code/lmstudio-openai-compat-endpoints.md @@ -0,0 +1,86 @@ +--- +title: "LM Studio — OpenAI Compatibility Endpoints" +aliases: [lmstudio-openai-compat, lmstudio-oai-endpoints] +tags: [lmstudio, openai, local-llm, api, embeddings, chat-completions] +sources: [raw/OpenAI Compatibility Endpoints.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio — OpenAI Compatibility Endpoints + +LM Studio exposes an OpenAI-compatible HTTP server. Any existing OpenAI client (Python, TypeScript, cURL, C#, etc.) works against it by changing only the **base URL**. + +Default port: `1234`. + +## Supported Endpoints + +| Endpoint | Method | Purpose | +|----------|--------|---------| +| `/v1/models` | GET | List loaded/available models | +| `/v1/responses` | POST | Responses API (Codex-compatible) | +| `/v1/chat/completions` | POST | Chat with text and images | +| `/v1/embeddings` | POST | Generate text embeddings | +| `/v1/completions` | POST | Legacy completions | + +## Switching Base URL + +Only one line changes — the `base_url` / `baseUrl` property. + +### Python + +```python +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:1234/v1" +) +# rest of your code unchanged +``` + +### TypeScript + +```typescript +import OpenAI from 'openai'; + +const client = new OpenAI({ + baseUrl: "http://localhost:1234/v1" +}); +``` + +### cURL + +```bash +curl http://localhost:1234/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "", + "messages": [{"role": "user", "content": "Say this is a test!"}], + "temperature": 0.7 + }' +``` + +## Codex Support + +LM Studio supports OpenAI Codex via the `POST /v1/responses` endpoint — the same one Codex targets. + +## Key Takeaways + +- **Drop-in replacement** — swap `base_url` to `http://localhost:1234/v1`; no other code changes needed +- **Five endpoints** — models, responses, chat/completions, embeddings, legacy completions +- **No API key required** by default (LM Studio runs locally) +- **Codex works** because LM Studio implements `/v1/responses` +- **Model IDs differ** — use the model identifier shown in LM Studio, not OpenAI slugs like `gpt-4o` +- For richer stats (token/s, TTFT, model lifecycle) use the [[wiki/claude-code/lmstudio-rest-api|native LM Studio REST API]] instead + +## Related Articles + +- [[wiki/claude-code/lmstudio-anthropic-compat|Anthropic Compat Endpoints]] — `/v1/messages` drop-in for Claude SDK +- [[wiki/claude-code/lmstudio-chat-completions|Chat Completions]] — full param reference for `/v1/chat/completions` +- [[wiki/claude-code/lmstudio-embeddings|Embeddings]] — `/v1/embeddings` for local RAG pipelines +- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native v1 API with extended model metadata +- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management for loaded models + +## Sources + +- [LM Studio OpenAI Compat Docs](https://lmstudio.ai/docs/developer/openai-compat) — raw/OpenAI Compatibility Endpoints.md diff --git a/wiki/claude-code/lmstudio-responses-api.md b/wiki/claude-code/lmstudio-responses-api.md new file mode 100644 index 0000000..3297328 --- /dev/null +++ b/wiki/claude-code/lmstudio-responses-api.md @@ -0,0 +1,124 @@ +--- +title: "LM Studio Responses API" +aliases: [lmstudio-responses, lm-studio-openai-responses] +tags: [lm-studio, openai-compat, responses-api, streaming, mcp, reasoning] +sources: [raw/Responses.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio Responses API + +LM Studio exposes `/v1/responses` — an OpenAI Responses API-compatible endpoint with support for streaming, reasoning effort, stateful multi-turn via `previous_response_id`, and Remote MCP tools. + +Base URL: `http://localhost:1234/v1/responses` + +--- + +## Basic Request (non-streaming) + +```bash +curl http://localhost:1234/v1/responses \ + -H "Content-Type: application/json" \ + -d '{ + "model": "openai/gpt-oss-20b", + "input": "Provide a prime number less than 50", + "reasoning": { "effort": "low" } + }' +``` + +- `input` — plain string prompt (no messages array required) +- `reasoning.effort` — `"low"` | `"medium"` | `"high"` (model-dependent) + +--- + +## Stateful Follow-up + +Carry conversation state across calls using `previous_response_id`: + +```bash +curl http://localhost:1234/v1/responses \ + -H "Content-Type: application/json" \ + -d '{ + "model": "openai/gpt-oss-20b", + "input": "Multiply it by 2", + "previous_response_id": "resp_123" + }' +``` + +- The `id` field from any prior response becomes the `previous_response_id` of the next +- No need to replay the full message history client-side + +--- + +## Streaming + +```bash +curl http://localhost:1234/v1/responses \ + -H "Content-Type: application/json" \ + -d '{ + "model": "openai/gpt-oss-20b", + "input": "Hello", + "stream": true + }' +``` + +SSE events emitted: +| Event | Description | +|-------|-------------| +| `response.created` | Response object initialised | +| `response.output_text.delta` | Incremental text chunk | +| `response.completed` | Final event, full response included | + +--- + +## Remote MCP Tools (opt-in) + +Enable in LM Studio: **Developer → Settings → Remote MCP**. + +```bash +curl http://localhost:1234/v1/responses \ + -H "Content-Type: application/json" \ + -d '{ + "model": "ibm/granite-4-micro", + "input": "What is the top trending model on hugging face?", + "tools": [ + { + "type": "mcp", + "server_label": "huggingface", + "server_url": "https://huggingface.co/mcp", + "allowed_tools": ["model_search"] + } + ] + }' +``` + +- `server_label` — arbitrary identifier for this MCP server +- `server_url` — remote MCP server URL +- `allowed_tools` — allowlist of tool names the model may call + +--- + +## Key Takeaways + +- `/v1/responses` is an OpenAI Responses API drop-in; swap base URL only +- `previous_response_id` enables multi-turn without replaying history — simpler than maintaining a messages array +- Streaming uses standard SSE; listen for `response.output_text.delta` for incremental chunks +- Remote MCP tools are per-request and opt-in — must enable the feature in LM Studio settings first +- `reasoning.effort` controls thinking depth; not all models support it + +--- + +## Related + +- [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — overview of all 5 OAI-compatible endpoints +- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — `/v1/chat/completions` with full param reference +- [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — `/v1/messages` Anthropic-compat with streaming + tool-use +- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native endpoint feature comparison table +- [[wiki/claude-code/mcp-integration|MCP Integration]] — Claude Code MCP setup and server patterns + +--- + +## Sources + +- `raw/Responses.md` — LM Studio developer docs: `/v1/responses` endpoint diff --git a/wiki/claude-code/lmstudio-rest-api.md b/wiki/claude-code/lmstudio-rest-api.md new file mode 100644 index 0000000..393b956 --- /dev/null +++ b/wiki/claude-code/lmstudio-rest-api.md @@ -0,0 +1,75 @@ +--- +title: "LM Studio REST API (v1)" +aliases: [lmstudio-api, lm-studio-rest, lmstudio-v1] +tags: [lmstudio, rest-api, local-inference, openai-compat, anthropic-compat, mcp] +sources: [raw/LM Studio API.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio REST API (v1) + +LM Studio 0.4.0 introduced the native **v1 REST API** at `/api/v1/*`. It sits alongside OpenAI-compatible and Anthropic-compatible endpoints and offers the richest feature set for local inference. + +## v1 vs v0 + +The old v0 API (`/api/v0/*`) is superseded. Migrate to `/api/v1/*` for: + +- **Stateful chats** — server keeps conversation context across turns +- **MCP via API** — use MCPs configured in LM Studio directly from requests +- **Authentication** — API token support +- **Model management** — download, load, unload via API + +## Supported Endpoints + +| Endpoint | Method | Purpose | +|---|---|---| +| `/api/v1/chat` | POST | Inference (native) | +| `/api/v1/models` | GET | List loaded models | +| `/api/v1/models/load` | POST | Load a model into VRAM | +| `/api/v1/models/unload` | POST | Unload a model | +| `/api/v1/models/download` | POST | Download a model | +| `/api/v1/models/download/status` | GET | Poll download progress | + +## Inference Endpoint Comparison + +Four endpoints can run inference. Pick based on which features you need: + +| Feature | `/api/v1/chat` | `/v1/responses` (OAI) | `/v1/chat/completions` (OAI) | `/v1/messages` (Anthropic) | +|---|:---:|:---:|:---:|:---:| +| Streaming | ✅ | ✅ | ✅ | ✅ | +| Stateful chat | ✅ | ✅ | ❌ | ❌ | +| Remote MCPs | ✅ | ✅ | ❌ | ❌ | +| LM Studio MCPs | ✅ | ✅ | ❌ | ❌ | +| Custom tools | ❌ | ✅ | ✅ | ✅ | +| Assistant messages in request | ❌ | ✅ | ✅ | ✅ | +| Model load streaming events | ✅ | ❌ | ❌ | ❌ | +| Prompt processing events | ✅ | ❌ | ❌ | ❌ | +| Specify context length | ✅ | ❌ | ❌ | ❌ | + +**Decision guide:** +- Need MCP tools + stateful chat → `/api/v1/chat` or `/v1/responses` +- Need custom tool definitions → `/v1/responses`, `/v1/chat/completions`, or `/v1/messages` +- Dropping in existing OpenAI SDK code → `/v1/chat/completions` +- Dropping in existing Anthropic SDK code → `/v1/messages` + +## Key Takeaways + +- The **native `/api/v1/chat`** endpoint has exclusive features: stateful chat, LM Studio MCPs, model-load events, prompt-processing events, and per-request context length. +- **`/v1/responses`** (OpenAI Responses API compat) is the best of both worlds — stateful + MCP + custom tools. +- **`/v1/chat/completions`** is the broadest drop-in for existing OpenAI code but loses statefulness and MCP. +- **`/v1/messages`** lets you redirect the Anthropic SDK to a local model with minimal code change (see [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]]). +- Model management endpoints let you fully automate the model lifecycle — download → load → infer → unload — without touching the GUI. +- API token auth is available for securing the local server (useful when exposed on a LAN). + +## Related Articles + +- [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]] — redirect Claude Code / Anthropic SDK to LM Studio via env vars +- [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] — OpenAI `/v1/chat/completions` usage, params, debugging +- [[wiki/claude-code/lmstudio-embeddings|lmstudio-embeddings]] — `/v1/embeddings` for local RAG pipelines +- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|lmstudio-idle-ttl-auto-evict]] — memory management: TTL and auto-evict +- [[wiki/agent-sdk/overview|agent-sdk/overview]] — build multi-agent systems that call local models + +## Sources + +- `raw/LM Studio API.md` — clipped from lmstudio.ai/docs/developer/rest diff --git a/wiki/claude-code/lmstudio-serve-on-network.md b/wiki/claude-code/lmstudio-serve-on-network.md new file mode 100644 index 0000000..3beaa2f --- /dev/null +++ b/wiki/claude-code/lmstudio-serve-on-network.md @@ -0,0 +1,54 @@ +--- +title: "LM Studio — Serve on Local Network" +aliases: [lmstudio-network, lmstudio-lan-server] +tags: [lmstudio, networking, api-server, local-llm, lan] +sources: [raw/Serve on Local Network.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio — Serve on Local Network + +Enabling **Serve on Local Network** makes the LM Studio API server accessible to other devices on the same LAN — not just `localhost`. + +## How It Works + +- By default the server binds to `127.0.0.1` (localhost only) +- With the option enabled it binds to your machine's **local network IP** (e.g. `192.168.x.x`) +- The API access URL shown in LM Studio updates to reflect the new binding +- All existing API endpoints stay the same — only the host changes + +## Use Cases + +| Scenario | Why useful | +|----------|-----------| +| Thin-client devices (laptop, tablet, phone) | Offload inference to a powerful desktop on the same network | +| Shared team access | Multiple people hit one LM Studio instance | +| IoT / edge devices | Raspberry Pi or similar calls the API over LAN | +| Local service mesh | Other self-hosted services (Home Assistant, scripts) consume the LLM | + +## Setup Steps + +1. Open LM Studio → **Local Server** tab +2. Toggle **Serve on Local Network** → ON +3. Note the updated **API access URL** displayed (e.g. `http://192.168.1.x:1234`) +4. On client devices, point `base_url` to that address instead of `http://localhost:1234` + +## Key Takeaways + +- One toggle — no firewall rule changes required on most home routers (LAN-to-LAN is open by default) +- The API surface is identical to localhost; only the bind address differs +- Useful when pairing a powerful homelab machine with weaker clients — see [[wiki/homelab/_index|homelab]] for server options +- Combine with [[wiki/claude-code/lmstudio-headless-service|lmstudio-headless-service]] to run the server without the GUI on a headless machine +- For redirecting Claude Code itself to the local server, see [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]] + +## Related + +- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — full endpoint reference +- [[wiki/claude-code/lmstudio-headless-service|LM Studio Headless Service]] — run without GUI (daemon mode) +- [[wiki/claude-code/lmstudio-anthropic-compat|Anthropic Compat Endpoints]] — point Claude Code at local server +- [[wiki/homelab/_index|Homelab]] — self-hosted hardware for running LM Studio + +## Sources + +- `raw/Serve on Local Network.md` — clipped from lmstudio.ai/docs/developer/core/server/serve-on-network diff --git a/wiki/claude-code/lmstudio-server-settings.md b/wiki/claude-code/lmstudio-server-settings.md new file mode 100644 index 0000000..f70984b --- /dev/null +++ b/wiki/claude-code/lmstudio-server-settings.md @@ -0,0 +1,62 @@ +--- +title: "LM Studio Server Settings" +aliases: [lmstudio-server-config, lm-studio-api-server-settings] +tags: [lmstudio, api-server, configuration, mcp, jit, cors, auth] +sources: [raw/Server Settings.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio Server Settings + +Configuration options for the LM Studio API server — accessible from the LM Studio UI or `lms` CLI. Controls port, auth, network access, MCP permissions, CORS, and JIT model memory management. + +## Network & Access + +| Setting | Type | Description | +|---------|------|-------------| +| **Server Port** | Integer | Port the API server listens on (default `1234`) | +| **Serve on Local Network** | Switch | Binds server to LAN IP so other devices can reach it — see [[wiki/claude-code/lmstudio-serve-on-network\|Serve on Network]] | +| **Enable CORS** | Switch | Allow cross-origin requests (needed for browser-based clients hitting a local server) | + +## Authentication + +| Setting | Type | Description | +|---------|------|-------------| +| **Require Authentication** | Switch | Clients must pass a valid token in `Authorization` header — see [[wiki/claude-code/lmstudio-anthropic-compat\|LM Studio Auth docs]] | + +> Authentication is a prerequisite for enabling MCP server access from `mcp.json`. + +## MCP (Model Context Protocol) + +| Setting | Type | Description | +|---------|------|-------------| +| **Allow per-request MCPs** | Switch | Clients may specify ephemeral remote MCP servers in individual requests (not in `mcp.json`). Only remote MCPs supported. | +| **Allow calling servers from mcp.json** | Switch | Clients may use MCP servers defined in your LM Studio `mcp.json`. **Requires Auth enabled.** Security risk if those servers have filesystem/data access. | + +Related: [[wiki/claude-code/mcp-integration\|MCP Integration]] + +## JIT (Just-in-Time) Model Loading + +Saves RAM by loading models on demand rather than pre-loading them. + +| Setting | Type | Description | +|---------|------|-------------| +| **Just in Time Model Loading** | Switch | Load a model at request time if not already loaded | +| **Auto Unload Unused JIT Models** | Switch | Automatically evict JIT models when idle | +| **Only Keep Last JIT Loaded Model** | Switch | Evict all but the most recently used JIT model — minimizes RAM usage | + +> For deeper JIT / TTL / eviction behavior, see [[wiki/claude-code/lmstudio-idle-ttl-auto-evict\|Idle TTL and Auto-Evict]]. + +## Key Takeaways + +- **Port** is the only integer setting; all others are on/off switches. +- **Auth is a gate** — `mcp.json` server access won't work without it enabled. +- **Per-request MCPs** are ephemeral and remote-only; they don't persist after the request. +- **CORS** must be on for any browser app (web UI, local HTML tool) to call the API. +- **JIT trio** (`JIT Load` → `Auto Unload` → `Only Keep Last`) progressively tightens memory: enable all three on low-RAM machines. +- LAN access via [[wiki/claude-code/lmstudio-serve-on-network\|Serve on Network]] is a separate setting from CORS — you may need both. + +## Sources + +- `raw/Server Settings.md` — scraped from [lmstudio.ai/docs/developer/core/server/settings](https://lmstudio.ai/docs/developer/core/server/settings) diff --git a/wiki/claude-code/lmstudio-structured-output.md b/wiki/claude-code/lmstudio-structured-output.md new file mode 100644 index 0000000..bef04a1 --- /dev/null +++ b/wiki/claude-code/lmstudio-structured-output.md @@ -0,0 +1,150 @@ +--- +title: "LM Studio Structured Output" +aliases: [lmstudio-json-schema, structured-output-lmstudio] +tags: [lmstudio, structured-output, json-schema, openai-compat, local-llm] +sources: [raw/Structured Output.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio Structured Output + +Enforce a specific JSON shape on LLM responses by passing a JSON schema to `/v1/chat/completions`. Compatible with OpenAI's Structured Output API format. + +## How It Works + +- Add a `response_format` field to the chat completions request +- Provide a `json_schema` with a `name`, optional `strict`, and a `schema` object +- The model is constrained to return valid JSON matching that schema +- Response arrives as a string in `choices[0].message.content` — parse it with `json.loads()` + +## Server Setup + +```bash +lms server start +# or enable from Developer tab in LM Studio UI +``` + +Install the CLI first if needed: +```bash +npx lmstudio install-cli +``` + +## request_format Shape + +```json +"response_format": { + "type": "json_schema", + "json_schema": { + "name": "my_schema", + "strict": "true", + "schema": { + "type": "object", + "properties": { + "field": { "type": "string" } + }, + "required": ["field"] + } + } +} +``` + +## cURL Example + +```bash +curl http://localhost:1234/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "{{model}}", + "messages": [ + {"role": "system", "content": "You are a helpful jokester."}, + {"role": "user", "content": "Tell me a joke."} + ], + "response_format": { + "type": "json_schema", + "json_schema": { + "name": "joke_response", + "strict": "true", + "schema": { + "type": "object", + "properties": { "joke": {"type": "string"} }, + "required": ["joke"] + } + } + }, + "temperature": 0.7, + "max_tokens": 50, + "stream": false + }' +``` + +## Python Example + +```python +from openai import OpenAI +import json + +client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio") + +character_schema = { + "type": "json_schema", + "json_schema": { + "name": "characters", + "schema": { + "type": "object", + "properties": { + "characters": { + "type": "array", + "items": { + "type": "object", + "properties": { + "name": {"type": "string"}, + "occupation": {"type": "string"}, + "personality": {"type": "string"}, + "background": {"type": "string"} + }, + "required": ["name", "occupation", "personality", "background"] + }, + "minItems": 1 + } + }, + "required": ["characters"] + } + } +} + +response = client.chat.completions.create( + model="your-model", + messages=[ + {"role": "system", "content": "You are a helpful AI assistant."}, + {"role": "user", "content": "Create 1-3 fictional characters"} + ], + response_format=character_schema, +) + +results = json.loads(response.choices[0].message.content) +print(json.dumps(results, indent=2)) +``` + +## Structured Output Engines + +| Model Format | Engine | +|---|---| +| GGUF | `llama.cpp` grammar-based sampling | +| MLX | [Outlines](https://github.com/dottxt-ai/outlines) via [lmstudio-ai/mlx-engine](https://github.com/lmstudio-ai/mlx-engine) | + +## Key Takeaways + +- Use `response_format.type = "json_schema"` — same shape as OpenAI's Structured Outputs API +- Works with any OpenAI-compatible client SDK (Python, TS, etc.) just by pointing `base_url` at localhost +- Response is always a **string** in `choices[0].message.content` — always call `json.loads()` on it +- Not all models support this: **models below 7B parameters often cannot do structured output** — check the model card +- GGUF uses grammar sampling; MLX uses Outlines — both constrain tokens at generation time, not post-hoc +- All standard `/v1/chat/completions` params (temperature, max_tokens, stream, etc.) still apply + +## Related + +- [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] — full parameter reference for the completions endpoint +- [[wiki/claude-code/lmstudio-openai-compat-endpoints|lmstudio-openai-compat-endpoints]] — overview of all OpenAI-compat endpoints +- [[wiki/claude-code/lmstudio-responses-api|lmstudio-responses-api]] — stateful responses with streaming and Remote MCP tools +- [[wiki/claude-code/lmstudio-rest-api|lmstudio-rest-api]] — native LM Studio API and endpoint feature comparison diff --git a/wiki/claude-code/lmstudio-tool-use.md b/wiki/claude-code/lmstudio-tool-use.md new file mode 100644 index 0000000..a1cd43f --- /dev/null +++ b/wiki/claude-code/lmstudio-tool-use.md @@ -0,0 +1,158 @@ +--- +title: "LM Studio Tool Use (Function Calling)" +aliases: [lmstudio-function-calling, lmstudio-tools] +tags: [lmstudio, tool-use, function-calling, openai-compat, python, local-llm] +sources: [raw/Tool Use.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +# LM Studio Tool Use (Function Calling) + +Tool use lets LLMs *request* calls to external functions/APIs via LM Studio's OpenAI-compatible `/v1/chat/completions` and `/v1/responses` endpoints. Your code executes the actual functions and feeds results back. + +## Key Takeaways + +- LLMs **cannot execute code** — they output structured text requesting a tool call; your code runs it +- Uses the same format as OpenAI's Function Calling API — any OpenAI SDK works +- Tool definitions are injected into the system prompt via the model's chat template +- Two support tiers: **Native** (model trained for tool use) and **Default** (fallback prompt injection) +- After tool execution, re-prompt the model *without* tools to get a plain-text final answer +- Streaming tool calls arrive in chunks — accumulate `delta.tool_calls` before executing + +## High-Level Flow + +``` +Setup LLM + tool list + → Get user input + → LLM prompted with messages + → Needs tools? + Yes → Tool Response → Execute tools → Add results to messages → re-prompt + No → Normal response → loop back +``` + +## Tool Definition Format + +```json +{ + "type": "function", + "function": { + "name": "get_delivery_date", + "description": "Get the delivery date for a customer's order", + "parameters": { + "type": "object", + "properties": { + "order_id": { "type": "string" } + }, + "required": ["order_id"] + } + } +} +``` + +Pass as the `tools` array in the request body — identical to OpenAI's spec. + +## Response Parsing + +- Tool call detected: `choices[0].message.tool_calls` array is populated; `finish_reason = "tool_calls"` +- No tool call: response lands in `choices[0].message.content` as normal text +- If the model outputs a malformed tool call, LM Studio falls back to `content` — use `lms log stream` to debug + +## Multi-Turn Pattern (Python) + +```python +from openai import OpenAI +import json + +client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio") + +# 1. First call — with tools +response = client.chat.completions.create( + model="lmstudio-community/qwen2.5-7b-instruct", + messages=messages, + tools=tools, +) + +# 2. Execute the requested tool +tool_call = response.choices[0].message.tool_calls[0] +args = json.loads(tool_call.function.arguments) +result = my_function(**args) + +# 3. Append both the assistant's tool-call message and the tool result +messages += [ + {"role": "assistant", "tool_calls": [tool_call]}, + {"role": "tool", "content": json.dumps(result), "tool_call_id": tool_call.id}, +] + +# 4. Second call — WITHOUT tools for final plain-text answer +final = client.chat.completions.create(model=model, messages=messages) +print(final.choices[0].message.content) +``` + +## Native vs Default Support + +| Level | What it means | Quality | +|-------|---------------|---------| +| **Native** | Model has a tool-use chat template + LM Studio parses its format | Best | +| **Default** | LM Studio injects a custom system prompt + converts `tool` role to `user` | Variable | + +### Models with Native Support (as of 2024-11) + +- **Qwen** — Qwen2.5-7B-Instruct (GGUF / MLX) +- **Llama** — Llama-3.1 / 3.2 8B-Instruct (GGUF / MLX) +- **Mistral** — Ministral-8B-Instruct-2410 (GGUF / MLX) + +Native models show a hammer badge in the LM Studio UI. + +## Streaming Tool Calls + +```python +# Accumulate chunks — name and arguments arrive in pieces +for chunk in stream: + delta = chunk.choices[0].delta + if delta.tool_calls: + for tc in delta.tool_calls: + # Append tc.id, tc.function.name, tc.function.arguments fragments +``` + +Execute only after the stream ends and `tool_calls` is fully assembled. + +## Quick Start + +```bash +# Start server +lms server start + +# Load a model +lms load + +# Debug raw prompts (see how tools are injected) +lms log stream +``` + +```bash +# curl single-turn example +curl http://localhost:1234/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{"model": "lmstudio-community/qwen2.5-7b-instruct", + "messages": [{"role": "user", "content": "Search dell products under $50"}], + "tools": [...]}' +``` + +## Troubleshooting + +- **No `tool_calls` in response** — model output was malformed; run `lms log stream` to inspect the raw prompt and output +- **Smaller models** — may not follow the tool call format reliably; prefer ≥7B models with native support +- **Default mode weirdness** — check the injected system prompt via `lms log stream`; the format uses `[TOOL_REQUEST]...[END_TOOL_REQUEST]` tags + +## Related + +- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — full `/v1/chat/completions` param reference +- [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — all 5 compatible endpoints +- [[wiki/claude-code/lmstudio-responses-api|LM Studio Responses API]] — `/v1/responses` with Remote MCP tools +- [[wiki/claude-code/lmstudio-structured-output|LM Studio Structured Output]] — enforce JSON schema on responses +- [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — Anthropic-compat tool use examples + +## Sources + +- `raw/Tool Use.md` — LM Studio official docs (lmstudio.ai/docs/developer/openai-compat/tools), published 2024-11-19 diff --git a/wiki/homelab/_index.md b/wiki/homelab/_index.md index c4fbc7a..5c24734 100644 --- a/wiki/homelab/_index.md +++ b/wiki/homelab/_index.md @@ -43,3 +43,4 @@ Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, bud | [[wiki/homelab/glance-dashboard\|Glance — Self-hosted Dashboard]] | Glance setup replacing Homarr: Docker config, 5-page layout, Prometheus RAPL metrics, key patterns ($include caveat, internal IPs only) | session 2026-04-29 | 2026-04-29 | | [[wiki/homelab/homelab-media-stack\|Homelab Media Stack — Jellyfin + *arr + qBittorrent Setup]] | CT111 media LXC: unified /data mount pattern, Intel QuickSync GPU passthrough, step-by-step qBittorrent categories + Sonarr/Radarr/Prowlarr wiring | session 2026-04-26 | 2026-04-26 | | [[wiki/homelab/hp-elitedesk-800g3-proxmox\|HP Elitedesk 800 G3 — Proxmox Setup Log]] | Real homelab server setup log: i5-7500, 24 GB RAM, 256 GB NVMe + 6 TB HDD, LXC containers, GPU passthrough (AMD/Intel) | session 2026-04-18 | 2026-04-21 | +| [[wiki/homelab/hp-elitedesk-800g3-teardown-upgrade\|HP EliteDesk 800 G3 SFF — Teardown, Upgrade & Benchmarks]] | Full disassembly/reassembly guide: proprietary connectors caveat, dual-channel RAM, CPU cooler swap, GTX 1050 Ti, thermal benchmarks (GTA V, Flight Sim) | raw/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md | 2026-04-30 | diff --git a/wiki/homelab/hp-elitedesk-800g3-teardown-upgrade.md b/wiki/homelab/hp-elitedesk-800g3-teardown-upgrade.md new file mode 100644 index 0000000..0236817 --- /dev/null +++ b/wiki/homelab/hp-elitedesk-800g3-teardown-upgrade.md @@ -0,0 +1,153 @@ +--- +title: "HP EliteDesk 800 G3 SFF — Teardown, Upgrade & Benchmarks" +aliases: [elitedesk-800-g3-teardown, hp-sff-upgrade-guide] +tags: [homelab, hardware, hp, sff, upgrade, benchmark] +sources: [raw/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md] +created: 2026-04-30 +updated: 2026-04-30 +--- + +## Overview + +The HP EliteDesk 800 G3 SFF is a small form factor desktop often available cheaply at auctions. It uses a **proprietary motherboard and PSU connector** — not standard ATX — which limits some upgrade paths but still allows CPU, RAM, SSD, and GPU swaps. + +Reference config (video unit): i7-7700 · GTX 1050 Ti (low-profile) · 16 GB DDR4 · 256 GB NVMe SSD + +--- + +## Exterior Ports + +**Front** +- 1× USB-C +- 2× USB 3.0 +- 2× USB 2.0 +- Audio in/out +- Power button +- Slim optical drive bay +- Optional SD-card reader slot + +**Back** +- DisplayPort +- Flexible port option (VGA/DP/HDMI via option card) +- RJ45 (Gigabit) +- 2× USB 2.0 + 2× USB 3.0 +- Power connector +- GPU video outputs (from installed card) + +--- + +## Motherboard Layout + +Non-standard form factor — not ATX/ITX. Key connectors: + +| Component | Detail | +|-----------|--------| +| PCIe slots | 1× x16 (GPU), 2× x1, 1× x4 (downshifted) | +| RAM slots | 4× DDR4 DIMM — DIMM1/2 = Ch. B, DIMM3/4 = Ch. A | +| Storage | 1× M.2 NVMe SSD, 3× SATA, 1× M.2 Wi-Fi | +| Power | Proprietary non-standard PSU connector | +| Option card | VGA / DisplayPort / HDMI output slot | +| CMOS reset | Physical button on board | + +**Proprietary connectors = motherboard and PSU cannot be swapped for generic parts.** + +--- + +## Disassembly Procedure + +1. **Open case** — slide latch on top cover, no tools needed +2. **Open airflow panel** — provides better access to NVMe, SATA, and RAM +3. **Remove CPU cooler cover** (plastic airflow shroud) +4. Disconnect and slide out **slim DVD drive** (green latch release) +5. Remove **front panel** +6. Remove **GPU** (low-profile PCIe card, 4 GB VRAM) +7. Disconnect **proprietary power connectors** +8. Remove **NVMe SSD** (single retention screw) +9. Remove **RAM sticks** +10. Unscrew 4 screws → lift **CPU cooler** +11. Lift lever → remove **CPU** (LGA 1151) +12. Remove **motherboard** from chassis + +--- + +## Upgrade Notes + +### RAM — Dual Channel +- Use matching DIMMs in **same-colour slots** (one per channel) +- For 16 GB: 2× 8 GB — one in Ch. A slot, one in Ch. B slot + +### CPU Cooler Replacement +- Stock cooler can develop bearing noise +- Replacement must be **PWM 4-pin** type +- Heatsink mounts to chassis (not board) — install after board is seated in case +- Clean old paste with isopropyl alcohol before applying new thermal paste + +### 3.5" HDD Addition +- Install standoff screws on drive +- Slide into drive cage +- Connect SATA data + power cables + +### GPU (Low-Profile Required) +- SFF case requires **low-profile PCIe card** +- Tested: Gigabyte GTX 1050 Ti (4 GB VRAM) — fits the x16 slot + +--- + +## Reassembly Order + +1. CPU into socket (match orientation notch) +2. NVMe SSD → slot + screw +3. RAM → correct channel slots +4. Motherboard into case +5. CPU cooler + thermal paste → fix to chassis +6. Connect CPU fan to board +7. Airflow cover (clips onto CPU fan) +8. Power cables + speaker +9. DVD drive + SATA cable +10. 3.5" HDD → cage + cables +11. GPU → PCIe slot +12. SATA data cable for HDD +13. Front cover → top cover + +--- + +## Benchmark Results (i7-7700 + GTX 1050 Ti) + +| Test | Result | +|------|--------| +| Geekbench CPU | Expected for i7-7700 generation | +| Geekbench Compute (GPU) | Expected for GTX 1050 Ti | +| Microsoft Flight Simulator (Medium, 1080p) | ~30 FPS steady | +| GTA V (Very High + AA, 1080p) | Consistent 60+ FPS | + +### Thermal Observations +- CPU and GPU approach **~90°C** under sustained load (Flight Simulator) +- GTA V similarly runs hot +- SFF chassis limits airflow — **monitor temps if running sustained workloads** + +--- + +## Key Takeaways + +- The EliteDesk 800 G3 SFF uses **proprietary PSU and motherboard connectors** — plan upgrades around this constraint +- Case opens **tool-free** via a single top-cover latch; very serviceable for the form factor +- CPU cooler mounts to the **chassis** not the board — must be installed after the board is seated +- Dual-channel RAM requires same-colour DIMM pairing (Ch. A + Ch. B) +- GTX 1050 Ti (low-profile) is the practical GPU ceiling for this chassis without a riser +- Thermals are borderline under sustained 3D load — consider improved case airflow or undervolting for homelab/compute use +- For homelab use (Proxmox, LXCs), thermal load is far lighter — see [[wiki/homelab/hp-elitedesk-800g3-proxmox|HP Elitedesk 800 G3 — Proxmox Setup Log]] + +--- + +## Related Articles + +- [[wiki/homelab/hp-elitedesk-800g3-proxmox|HP Elitedesk 800 G3 — Proxmox Setup Log]] +- [[wiki/homelab/homelab-from-scratch-budget-build|Homelab From Scratch — Budget-First Design]] +- [[wiki/homelab/bigibz1-homelab-hardware|bigibz1 Homelab Hardware Reference]] +- [[wiki/homelab/homelab-services-map|Homelab — Full Services Map & Network Reference]] + +--- + +## Sources + +- [YouTube: HP EliteDesk 800 G3 SFF — Teardown, re-assembly and upgrade (jensd_be, 2021-03-08)](https://www.youtube.com/watch?v=n1ETa3mJ85I)