vault backup: 2026-04-30 14:42:43
This commit is contained in:
parent
1eb1072c19
commit
0631673e44
33 changed files with 1557 additions and 3 deletions
2
.obsidian/plugins/hoarder-sync/data.json
vendored
2
.obsidian/plugins/hoarder-sync/data.json
vendored
|
|
@ -4,7 +4,7 @@
|
|||
"syncFolder": "Hoarder",
|
||||
"attachmentsFolder": "Hoarder/attachments",
|
||||
"syncIntervalMinutes": 60,
|
||||
"lastSyncTimestamp": 1777555641238,
|
||||
"lastSyncTimestamp": 1777556035137,
|
||||
"updateExistingFiles": false,
|
||||
"excludeArchived": true,
|
||||
"onlyFavorites": false,
|
||||
|
|
|
|||
|
|
@ -149,3 +149,48 @@ tags: [daily]
|
|||
- 14:27 (<1min) | `memory-compiler`
|
||||
- **Asked:** Compile a new article about LM Studio embeddings into the structured wiki knowledge base.
|
||||
- **Done:** Filed article as `wiki/claude-code/lmstudio-embeddings.md` and updated master index with wikilinks to related LM Studio topics and RAG pattern.
|
||||
- 14:28 (1min) | `memory-compiler`
|
||||
- **Asked:** Compile a new HP EliteDesk 800G3 teardown/upgrade article into the wiki knowledge base.
|
||||
- **Done:** Filed article as `wiki/homelab/hp-elitedesk-800g3-teardown-upgrade.md` with full disassembly procedures, motherboard specs, and upgrade benchmarks.
|
||||
- 14:29 | `video-accessibility`
|
||||
- **Asked:** Asked for code review skills checklist from project instructions | Reviewed project completion and committed code changes | No files specified
|
||||
- **Done:** —
|
||||
- 14:30 (<1min) | `memory-compiler`
|
||||
- **Asked:** Compile a new article on LM Studio messages API into the wiki knowledge base.
|
||||
- **Done:** Created structured wiki article with cURL examples and updated topic and master indices.
|
||||
- 14:31 (<1min) | `memory-compiler`
|
||||
- **Asked:** Compile a new article about LM Studio's OpenAI-compatible endpoints into the wiki knowledge base.
|
||||
- **Done:** Created the article, updated the claude-code index to 21 articles, and bumped the master index count.
|
||||
- 14:32 | `video-accessibility`
|
||||
- **Asked:** What skills should be checked for code review according to the instructions?
|
||||
- **Done:** Reviewed project completion and identified environment configuration changes needed for optical-dev deployment.
|
||||
- 14:33 (<1min) | `memory-compiler`
|
||||
- **Asked:** Compile a new article about LM Studio headless service into the knowledge base wiki.
|
||||
- **Done:** Filed article as `claude-code/lmstudio-headless-service.md` and updated master index to reflect 23 total claude-code articles.
|
||||
- 14:33 (<1min) | `memory-compiler`
|
||||
- **Asked:** Compile a new article about LM Studio network serving into the knowledge base and update the master index.
|
||||
- **Done:** Created new LM Studio article, updated claude-code topic index, and incremented master index article count from 23 to 24.
|
||||
- 14:34 | `video-accessibility`
|
||||
- **Asked:** Check the project instructions for code review skills requirements.
|
||||
- **Done:** Identified OOM issue in whisper-worker memory configuration and pushed hotfix to restore original memory limits while keeping Cloud Run URLs.
|
||||
- 14:35 (<1min) | `memory-compiler`
|
||||
- **Asked:** Compile a raw article about LM Studio systemd configuration into the structured wiki knowledge base.
|
||||
- **Done:** Filed the article as a systemd unit configuration guide with systemd service setup details, unit file ordering, and PATH requirements.
|
||||
- 14:36 (<1min) | `memory-compiler`
|
||||
- **Asked:** File a new article about LM Studio structured output into the knowledge base.
|
||||
- **Done:** Created wiki article and updated both index files to register the new entry.
|
||||
- 14:37 (1min) | `memory-compiler`
|
||||
- **Asked:** Compile a new article on tool use into the knowledge base wiki structure.
|
||||
- **Done:** Processed raw article into `claude-code/lmstudio-tool-use.md` and updated both topic and master indexes.
|
||||
- 14:38 | `video-accessibility`
|
||||
- **Asked:** Check the instructions for code review skills to verify the completed project.
|
||||
- **Done:** Reviewed deployment fix that restored memory limits and confirmed all 7 containers started successfully with API health checks passing.
|
||||
- 14:39 (<1min) | `memory-compiler`
|
||||
- **Asked:** Compile a new LM Studio CLI article into the knowledge base wiki.
|
||||
- **Done:** Created structured wiki article with command reference and cross-links, updated master index from 29 to 30 claude-code articles.
|
||||
- 14:41 | `video-accessibility`
|
||||
- **Asked:** Check the project instructions for code review skills that need to be verified.
|
||||
- **Done:** Reviewed deployment status and identified CORS configuration and ffmpeg logging checks needed.
|
||||
- 14:41 | `video-accessibility`
|
||||
- **Asked:** Check project completion and review code quality assessment skills from instructions.
|
||||
- **Done:** Identified server authorization limitations and provided gsutil CORS configuration command for local execution.
|
||||
|
|
|
|||
|
|
@ -26,12 +26,12 @@ This 3-hop pattern works for hundreds of articles without vector search.
|
|||
| [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 75 |
|
||||
| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder, Docker DNS+AdGuard | 9 |
|
||||
| [[wiki/qa/_index\|qa/]] | Filed answers to queries (saved with `--file-back`) | 0 |
|
||||
| [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 39 |
|
||||
| [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 40 |
|
||||
| [[wiki/web-agency/_index\|web-agency/]] | AI-assisted website building & selling: Claude Code, Nanobanana 2, Kling, LaunchPath MCP | 9 |
|
||||
| [[wiki/dotfiles/_index\|dotfiles/]] | Linux terminal ricing: Kitty, Fish, WezTerm CLI, modern Rust CLI tools, LazyVim, unified themes, Tabby | 21 |
|
||||
| [[wiki/agent-sdk/_index\|agent-sdk/]] | Claude Agent SDK (formerly Claude Code SDK) — build autonomous AI agents in Python and TypeScript | 30 |
|
||||
| [[wiki/llm-models/_index\|llm-models/]] | LLM model catalogs — OpenAI and Claude/Anthropic models, IDs, context, pricing | 2 |
|
||||
| [[wiki/claude-code/_index\|claude-code/]] | Claude Code product docs — install, capabilities, surfaces, MCP, hooks, scheduling, multi-agent, plugins, skills, channels, error recovery, LM Studio local | 17 |
|
||||
| [[wiki/claude-code/_index\|claude-code/]] | Claude Code product docs — install, capabilities, surfaces, MCP, hooks, scheduling, multi-agent, plugins, skills, channels, error recovery, LM Studio local | 30 |
|
||||
| [[wiki/reports/_index\|reports/]] | Weekly and monthly summaries — generate: `uv run python scripts/report-generator.py --weekly` | 1 |
|
||||
| [[wiki/infrastructure/_index\|infrastructure/]] | Server inventory: all 10 SSH hosts — optical, optical-dev, optical-prod, baic, librechat, modocmms, box-cli, aimpress, pve | 10 |
|
||||
|
||||
|
|
|
|||
|
|
@ -31,3 +31,16 @@ Claude Code is Anthropic's agentic coding assistant. Works across terminal, IDE,
|
|||
| [[wiki/claude-code/lmstudio-anthropic-compat\|lmstudio-anthropic-compat]] | Redirect Claude Code and the Anthropic SDK to a local LM Studio server via two env vars; `/v1/messages` drop-in, auth options, cURL + Python examples | raw/Anthropic Compatibility Endpoints.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-chat-completions\|lmstudio-chat-completions]] | LM Studio OpenAI-compatible `/v1/chat/completions`: Python example, all supported params (incl. top_k, repeat_penalty), `lms log stream` debugging | raw/Chat Completions.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-embeddings\|lmstudio-embeddings]] | LM Studio `/v1/embeddings`: OpenAI-compat drop-in, Python example, newline stripping, batch inputs, use with FAISS/Chroma for local RAG | raw/Embeddings.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-idle-ttl-auto-evict\|lmstudio-idle-ttl-auto-evict]] | Idle TTL (per-request `ttl` field, `lms load --ttl`) and Auto-Evict (1 JIT model at a time) for LM Studio memory management | raw/Idle TTL and Auto-Evict.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-rest-api\|lmstudio-rest-api]] | LM Studio native v1 REST API: all endpoints, endpoint feature comparison (native vs OAI vs Anthropic compat), model lifecycle management | raw/LM Studio API.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-messages-api\|lmstudio-messages-api]] | LM Studio `/v1/messages` drop-in: basic, streaming (SSE events), and tool-use cURL examples; auth options | raw/Messages.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-openai-compat-endpoints\|lmstudio-openai-compat-endpoints]] | LM Studio OpenAI-compat overview: 5 endpoints, base_url swap pattern, Python/TS/cURL examples, Codex support | raw/OpenAI Compatibility Endpoints.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-responses-api\|lmstudio-responses-api]] | LM Studio `/v1/responses`: streaming SSE, stateful follow-up via `previous_response_id`, reasoning effort, Remote MCP tools | raw/Responses.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-headless-service\|lmstudio-headless-service]] | Run LM Studio without GUI: llmster daemon (recommended) or desktop tray mode; JIT model loading and auto-evict | raw/Run LM Studio as a service (headless).md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-serve-on-network\|lmstudio-serve-on-network]] | Bind LM Studio server to LAN IP so other devices (thin clients, IoT, team members) can call the API over the local network | raw/Serve on Local Network.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-server-settings\|lmstudio-server-settings]] | All LM Studio API server toggles: port, auth, CORS, LAN access, per-request MCPs, mcp.json access, JIT loading + auto-evict | raw/Server Settings.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-llmster-systemd\|lmstudio-llmster-systemd]] | systemd unit file for llmster: install daemon, load model at boot, ExecStartPre ordering, oneshot+RemainAfterExit pattern, service management commands | raw/Setup llmster as a Startup Task on Linux.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-structured-output\|lmstudio-structured-output]] | Enforce JSON schema on LLM responses via response_format; GGUF uses llama.cpp grammar, MLX uses Outlines; models <7B often unsupported | raw/Structured Output.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-tool-use\|lmstudio-tool-use]] | LM Studio function calling: tool definition format, multi-turn flow, native vs default support, streaming accumulation, Python examples | raw/Tool Use.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-mcp-via-api\|lmstudio-mcp-via-api]] | MCP servers via LM Studio `/api/v1/chat`: ephemeral (inline) vs mcp.json (pre-configured), allowed_tools, custom auth headers | raw/Using MCP via API.md | 2026-04-30 |
|
||||
| [[wiki/claude-code/lmstudio-lms-cli\|lmstudio-lms-cli]] | `lms` CLI: model download/load/unload/list, server start/stop, log streaming, GPU offload flags, --identifier alias, daemon management | raw/lms — LM Studio's CLI.md | 2026-04-30 |
|
||||
|
|
|
|||
104
wiki/claude-code/lmstudio-headless-service.md
Normal file
104
wiki/claude-code/lmstudio-headless-service.md
Normal file
|
|
@ -0,0 +1,104 @@
|
|||
---
|
||||
title: "LM Studio Headless / Service Mode"
|
||||
aliases: [lmstudio-daemon, llmster, lmstudio-background-service]
|
||||
tags: [lmstudio, local-llm, headless, daemon, jit-loading]
|
||||
sources: [raw/Run LM Studio as a service (headless).md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio Headless / Service Mode
|
||||
|
||||
GUI-less operation of LM Studio: run as a background daemon, start on machine login, and load models on demand via JIT.
|
||||
|
||||
## Two Approaches
|
||||
|
||||
| Approach | Best For | GUI Required? |
|
||||
|----------|----------|---------------|
|
||||
| **llmster** (recommended) | Linux servers, cloud, GPU rigs, headless machines | No |
|
||||
| **Desktop app headless mode** | Machines with a GUI where app is already installed | Yes (hidden to tray) |
|
||||
|
||||
---
|
||||
|
||||
## Option 1: llmster (Recommended)
|
||||
|
||||
`llmster` is the core of the LM Studio desktop app, repackaged as a server-native daemon. No GUI dependency.
|
||||
|
||||
### Install
|
||||
|
||||
```bash
|
||||
# Linux / Mac
|
||||
curl -fsSL https://lmstudio.ai/install.sh | bash
|
||||
|
||||
# Windows (PowerShell)
|
||||
irm https://lmstudio.ai/install.ps1 | iex
|
||||
```
|
||||
|
||||
### Start the daemon
|
||||
|
||||
```bash
|
||||
lms daemon up
|
||||
```
|
||||
|
||||
- To auto-start on Linux boot, configure it as a **Linux Startup Task** (see LM Studio docs).
|
||||
- Full CLI reference: `lms daemon --help`
|
||||
|
||||
---
|
||||
|
||||
## Option 2: Desktop App in Headless Mode
|
||||
|
||||
Works on Mac, Windows, Linux (with GUI). Useful if the desktop app is already installed.
|
||||
|
||||
### Run server on login
|
||||
|
||||
1. Open app settings (`Cmd/Ctrl` + `,`)
|
||||
2. Enable **"Run LLM server on login"**
|
||||
3. Exiting the app minimizes to tray — server keeps running
|
||||
|
||||
### Start server programmatically
|
||||
|
||||
```bash
|
||||
lms server start
|
||||
```
|
||||
|
||||
Last server state is saved and restored automatically on launch.
|
||||
|
||||
---
|
||||
|
||||
## Just-In-Time (JIT) Model Loading
|
||||
|
||||
Applies to **both** options. Useful when using LM Studio as a backend for other tools (Open WebUI, Claude Code, custom apps).
|
||||
|
||||
| JIT State | `/v1/models` returns | Inference behavior |
|
||||
|-----------|---------------------|--------------------|
|
||||
| **ON** | All downloaded models | Auto-loads model into VRAM on first call |
|
||||
| **OFF** | Only models in VRAM | Must manually load model first |
|
||||
|
||||
### Auto-Unload
|
||||
|
||||
JIT-loaded models are **auto-evicted** after a period of inactivity — see [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] for TTL settings and per-request `ttl` field.
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- **llmster** is the preferred headless path — works on servers and CI without any GUI
|
||||
- Desktop headless mode is a quick option for developer machines already running the app
|
||||
- JIT loading eliminates manual `lms load` calls; models are loaded on first inference request
|
||||
- JIT-loaded models auto-unload after inactivity (configurable TTL)
|
||||
- Use `lms server start` to programmatically control the REST server state
|
||||
- The OpenAI-compatible REST API (`/v1/...`) is available in both modes — see [[wiki/claude-code/lmstudio-openai-compat-endpoints|OpenAI Compat Endpoints]] and [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]]
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — all endpoints and lifecycle management
|
||||
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management for JIT-loaded models
|
||||
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|OpenAI Compat Endpoints]] — drop-in base_url swap for any OpenAI client
|
||||
- [[wiki/claude-code/lmstudio-anthropic-compat|Anthropic Compat Endpoints]] — redirect Claude Code / Anthropic SDK to local LM Studio
|
||||
|
||||
## Sources
|
||||
|
||||
- `raw/Run LM Studio as a service (headless).md`
|
||||
- LM Studio docs: https://lmstudio.ai/docs/developer/core/headless
|
||||
90
wiki/claude-code/lmstudio-idle-ttl-auto-evict.md
Normal file
90
wiki/claude-code/lmstudio-idle-ttl-auto-evict.md
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
---
|
||||
title: "LM Studio — Idle TTL and Auto-Evict"
|
||||
aliases: [lmstudio-ttl, lmstudio-auto-evict, idle-ttl]
|
||||
tags: [lmstudio, memory-management, jit-loading, ttl, api]
|
||||
sources: [raw/Idle TTL and Auto-Evict.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio — Idle TTL and Auto-Evict
|
||||
|
||||
Memory management features for LM Studio's JIT-loaded models. Prevents idle models from occupying VRAM and enables seamless model switching from external apps.
|
||||
|
||||
## Background
|
||||
|
||||
| Feature | Default | Purpose |
|
||||
|---------|---------|---------|
|
||||
| **JIT Loading** | enabled | Loads model on first API request — no manual preload needed |
|
||||
| **Idle TTL** | 60 min | Unloads a model after it has been idle for N seconds/minutes |
|
||||
| **Auto-Evict** | enabled | Unloads previous JIT model before loading a new one |
|
||||
|
||||
## Idle TTL
|
||||
|
||||
**Problem:** JIT-loaded models stay in VRAM even when idle (e.g. after you stop using Cline, Zed, or Continue.dev).
|
||||
|
||||
**Solution:** TTL starts a countdown when the model goes idle. The timer resets on every new request. When it expires, the model unloads automatically.
|
||||
|
||||
### Setting TTL
|
||||
|
||||
**App-wide default** — configure in Developer tab → Server Settings.
|
||||
|
||||
**Per-request (API)** — pass `ttl` in seconds in the request body:
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/api/v0/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "deepseek-r1-distill-qwen-7b",
|
||||
"ttl": 300,
|
||||
"messages": [...]
|
||||
}'
|
||||
```
|
||||
|
||||
Works on both the OpenAI-compat (`/v1/`) and LM Studio REST (`/api/v0/`) endpoints.
|
||||
|
||||
**`lms` CLI** — set TTL at load time:
|
||||
|
||||
```bash
|
||||
lms load <model> --ttl 3600 # 1 hour
|
||||
```
|
||||
|
||||
Models loaded with `lms load` have **no TTL by default** (persist until manual unload).
|
||||
|
||||
**Server tab** — TTL field visible when loading a model through the GUI.
|
||||
|
||||
## Auto-Evict
|
||||
|
||||
Controls how many JIT-loaded models can coexist in memory.
|
||||
|
||||
| State | Behaviour |
|
||||
|-------|-----------|
|
||||
| **ON** (default) | At most 1 JIT model in memory at a time; old model evicted before new one loads |
|
||||
| **OFF** | Models accumulate in memory; only unloaded by TTL expiry or manual action |
|
||||
|
||||
- Non-JIT (manually loaded) models are **never** affected by Auto-Evict.
|
||||
- Toggle in: Developer tab → Server Settings.
|
||||
|
||||
## TTL + Auto-Evict Together
|
||||
|
||||
- **Auto-Evict** handles immediate switching — keeps 1 active model.
|
||||
- **TTL** handles the "forgot to switch" case — cleans up if you just stop using an app.
|
||||
- Both can be active simultaneously for full memory hygiene.
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- Set `"ttl": 300` in any API request to cap a model's idle lifetime to 5 minutes.
|
||||
- `lms load <model> --ttl 3600` is the CLI equivalent for persistent sessions.
|
||||
- Auto-Evict (default ON) ensures only 1 JIT model lives in VRAM at a time — great for low-VRAM machines.
|
||||
- `lms load` bypasses TTL defaults; always pass `--ttl` explicitly if you want auto-cleanup.
|
||||
- These features are irrelevant for models loaded via the GUI Models tab (non-JIT path).
|
||||
|
||||
## Related
|
||||
|
||||
- [[wiki/claude-code/lmstudio-anthropic-compat|LM Studio Anthropic Compat]] — redirect Claude Code to local LM Studio
|
||||
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — full parameter reference incl. `top_k`, `repeat_penalty`
|
||||
- [[wiki/claude-code/lmstudio-embeddings|LM Studio Embeddings]] — local RAG with FAISS/Chroma
|
||||
|
||||
## Sources
|
||||
|
||||
- [LM Studio Docs — Idle TTL and Auto-Evict](https://lmstudio.ai/docs/developer/core/ttl-and-auto-evict)
|
||||
96
wiki/claude-code/lmstudio-llmster-systemd.md
Normal file
96
wiki/claude-code/lmstudio-llmster-systemd.md
Normal file
|
|
@ -0,0 +1,96 @@
|
|||
---
|
||||
title: "LM Studio — llmster Startup Service (systemd)"
|
||||
aliases: [llmster-systemd, lmstudio-startup, lmstudio-daemon-linux]
|
||||
tags: [lmstudio, llmster, systemd, linux, headless, local-llm]
|
||||
sources: [raw/Setup llmster as a Startup Task on Linux.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio — llmster Startup Service (systemd)
|
||||
|
||||
Configure `llmster` (LM Studio's headless daemon) to launch automatically at boot, load a model, and start the HTTP API server — all via a systemd unit file.
|
||||
|
||||
## Install llmster
|
||||
|
||||
```bash
|
||||
curl -fsSL https://lmstudio.ai/install.sh | bash
|
||||
lms --help # verify
|
||||
```
|
||||
|
||||
## Download a Model
|
||||
|
||||
```bash
|
||||
lms get openai/gpt-oss-20b
|
||||
# note the model path printed — used in service config
|
||||
```
|
||||
|
||||
## Manual Test (before systemd)
|
||||
|
||||
```bash
|
||||
lms load openai/gpt-oss-20b
|
||||
lms server start
|
||||
curl http://localhost:1234/v1/models # should return model list
|
||||
lms server stop
|
||||
```
|
||||
|
||||
## systemd Unit File
|
||||
|
||||
Create `/etc/systemd/system/lmstudio.service` (replace `YOUR_USERNAME`):
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=LM Studio Server
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
RemainAfterExit=yes
|
||||
User=YOUR_USERNAME
|
||||
Environment="HOME=/home/YOUR_USERNAME"
|
||||
ExecStartPre=/home/YOUR_USERNAME/.lmstudio/bin/lms daemon up
|
||||
ExecStartPre=/home/YOUR_USERNAME/.lmstudio/bin/lms load openai/gpt-oss-20b --yes
|
||||
ExecStart=/home/YOUR_USERNAME/.lmstudio/bin/lms server start
|
||||
ExecStop=/home/YOUR_USERNAME/.lmstudio/bin/lms daemon down
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
- `Type=oneshot` + `RemainAfterExit=yes` — service is considered "active" after `ExecStart` exits
|
||||
- `ExecStartPre` runs sequentially before `ExecStart`
|
||||
- Skip the `lms load` line to rely on [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|JIT loading + auto-evict]] instead
|
||||
|
||||
## Enable and Start
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable lmstudio.service
|
||||
sudo systemctl start lmstudio.service
|
||||
```
|
||||
|
||||
## Verify
|
||||
|
||||
```bash
|
||||
systemctl status lmstudio
|
||||
curl http://localhost:1234/v1/models
|
||||
```
|
||||
|
||||
## Service Management
|
||||
|
||||
```bash
|
||||
sudo systemctl stop lmstudio # stop
|
||||
sudo systemctl restart lmstudio # restart
|
||||
sudo systemctl disable lmstudio # remove from boot
|
||||
```
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- Use `lms daemon up` in `ExecStartPre` — the daemon must be running before `lms load` or `lms server start`
|
||||
- Binary path is `~/.lmstudio/bin/lms` — use the absolute path in the unit file (systemd has a minimal `$PATH`)
|
||||
- `Type=oneshot` + `RemainAfterExit=yes` keeps the service "active" so `ExecStop` runs on shutdown
|
||||
- Omit the `lms load` step and use [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|JIT loading]] to avoid pinning a model at boot
|
||||
- API is served on `http://localhost:1234` — see [[wiki/claude-code/lmstudio-headless-service|headless service overview]] for non-systemd options and [[wiki/claude-code/lmstudio-serve-on-network|LAN serving]] to expose to other devices
|
||||
|
||||
## Sources
|
||||
|
||||
- [LM Studio Headless llmster Docs](https://lmstudio.ai/docs/developer/core/headless_llmster)
|
||||
108
wiki/claude-code/lmstudio-lms-cli.md
Normal file
108
wiki/claude-code/lmstudio-lms-cli.md
Normal file
|
|
@ -0,0 +1,108 @@
|
|||
---
|
||||
title: "lms — LM Studio CLI"
|
||||
aliases: [lms-cli, lmstudio-cli]
|
||||
tags: [lmstudio, cli, local-llm, inference, server]
|
||||
sources: [raw/lms — LM Studio's CLI.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# lms — LM Studio CLI
|
||||
|
||||
`lms` is LM Studio's built-in CLI utility for managing models, the inference server, and the runtime. Ships with LM Studio — no separate install needed. MIT licensed, open source on GitHub.
|
||||
|
||||
## Installation & Verification
|
||||
|
||||
```bash
|
||||
# Already installed with LM Studio — just verify:
|
||||
lms --help
|
||||
```
|
||||
|
||||
Current version: `v0.0.47`
|
||||
|
||||
## Command Reference
|
||||
|
||||
| Command | What it does |
|
||||
|---------|-------------|
|
||||
| `lms chat` | Start interactive chat with a model in the terminal |
|
||||
| `lms get` | Search and download models |
|
||||
| `lms ls` | List models available on disk |
|
||||
| `lms ps` | List models currently loaded in memory |
|
||||
| `lms load` | Load a model (with GPU/context options) |
|
||||
| `lms unload` | Unload a model |
|
||||
| `lms import` | Import a model file into LM Studio |
|
||||
| `lms server start/stop` | Control the local API server |
|
||||
| `lms log` | Stream incoming/outgoing messages for debugging |
|
||||
| `lms runtime` | Manage and update the inference runtime |
|
||||
| `lms daemon` | Manage the headless llmster daemon |
|
||||
| `lms link` | Manage LM Link |
|
||||
| `lms clone` | Clone an artifact from LM Studio Hub |
|
||||
| `lms push` | Upload artifact to LM Studio Hub |
|
||||
| `lms login` | Authenticate with LM Studio |
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### Server control
|
||||
|
||||
```bash
|
||||
lms server start
|
||||
lms server stop
|
||||
```
|
||||
|
||||
### List & inspect models
|
||||
|
||||
```bash
|
||||
lms ls # models on disk (reflects My Models directory)
|
||||
lms ps # models currently loaded in memory
|
||||
```
|
||||
|
||||
### Load a model
|
||||
|
||||
```bash
|
||||
# With GPU offload and context size:
|
||||
lms load [--gpu=max|auto|0.0-1.0] [--context-length=1-N]
|
||||
|
||||
# --gpu=1.0 → 100% GPU offload
|
||||
# With a stable identifier alias:
|
||||
lms load openai/gpt-oss-20b --identifier="my-model-name"
|
||||
```
|
||||
|
||||
Using `--identifier` keeps the model ID stable across loads — useful when client code hardcodes a model name.
|
||||
|
||||
### Unload a model
|
||||
|
||||
```bash
|
||||
lms unload # unload specific model
|
||||
lms unload --all # unload everything
|
||||
```
|
||||
|
||||
### Debug message flow
|
||||
|
||||
```bash
|
||||
lms log stream # tail all incoming/outgoing API messages live
|
||||
```
|
||||
|
||||
Pairs with [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] for debugging request/response cycles.
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- `lms` ships with LM Studio — zero extra install steps
|
||||
- `lms ps` vs `lms ls`: loaded-in-memory vs on-disk — two different commands
|
||||
- `--gpu=1.0` forces full GPU offload; `--gpu=auto` lets LM Studio decide
|
||||
- `--identifier` flag on `lms load` decouples client model names from actual model paths
|
||||
- `lms log stream` is the fastest way to debug what's hitting the server
|
||||
- `lms daemon` manages [[wiki/claude-code/lmstudio-headless-service|llmster]] for headless/service deployments
|
||||
- MIT licensed: safe to embed in scripts and automation
|
||||
|
||||
## Related Articles
|
||||
|
||||
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — all API endpoints
|
||||
- [[wiki/claude-code/lmstudio-headless-service|Headless Service (llmster)]] — daemon mode for servers
|
||||
- [[wiki/claude-code/lmstudio-server-settings|Server Settings]] — port, auth, CORS, JIT loading
|
||||
- [[wiki/claude-code/lmstudio-chat-completions|Chat Completions]] — OpenAI-compat `/v1/chat/completions`
|
||||
- [[wiki/claude-code/lmstudio-llmster-systemd|llmster systemd unit]] — run llmster at boot on Linux
|
||||
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management
|
||||
|
||||
## Sources
|
||||
|
||||
- lmstudio.ai/docs/cli
|
||||
115
wiki/claude-code/lmstudio-mcp-via-api.md
Normal file
115
wiki/claude-code/lmstudio-mcp-via-api.md
Normal file
|
|
@ -0,0 +1,115 @@
|
|||
---
|
||||
title: "LM Studio — MCP via API"
|
||||
aliases: [lmstudio-mcp-api, mcp-lmstudio, lm-studio-mcp]
|
||||
tags: [lmstudio, mcp, api, tool-use, integration]
|
||||
sources: [raw/Using MCP via API.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio — MCP via API
|
||||
|
||||
Requires LM Studio 0.4.0+. MCP servers provide tools that models can call during chat requests via `/api/v1/chat`.
|
||||
|
||||
## Two Server Modes
|
||||
|
||||
| Feature | Ephemeral | mcp.json |
|
||||
|---------|-----------|----------|
|
||||
| Specified via | `integrations` → `"type": "ephemeral_mcp"` | `integrations` → `"type": "plugin"` |
|
||||
| Config | Per-request only | Pre-configured in `mcp.json` |
|
||||
| Use case | One-off / remote tools | Frequent use, tools needing `command` (local processes) |
|
||||
| Server ID | `server_label` in integration | `id` (e.g. `mcp/playwright`) |
|
||||
| Custom headers | `headers` field | Configured in `mcp.json` |
|
||||
|
||||
## Ephemeral MCP Servers
|
||||
|
||||
Defined inline per-request — no pre-configuration needed.
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/api/v1/chat \
|
||||
-H "Authorization: Bearer $LM_API_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "ibm/granite-4-micro",
|
||||
"input": "What is the top trending model on hugging face?",
|
||||
"integrations": [
|
||||
{
|
||||
"type": "ephemeral_mcp",
|
||||
"server_label": "huggingface",
|
||||
"server_url": "https://huggingface.co/mcp",
|
||||
"allowed_tools": ["model_search"]
|
||||
}
|
||||
],
|
||||
"context_length": 8000
|
||||
}'
|
||||
```
|
||||
|
||||
Response output contains typed entries: `reasoning`, `message`, and `tool_call` objects. Each `tool_call` includes the tool name, arguments, output, and `provider_info` identifying the server.
|
||||
|
||||
## mcp.json Pre-configured Servers
|
||||
|
||||
Recommended for servers that run local commands (e.g. `microsoft/playwright-mcp`) or are used frequently.
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/api/v1/chat \
|
||||
-H "Authorization: Bearer $LM_API_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "ibm/granite-4-micro",
|
||||
"input": "Open lmstudio.ai",
|
||||
"integrations": ["mcp/playwright"],
|
||||
"context_length": 8000,
|
||||
"temperature": 0
|
||||
}'
|
||||
```
|
||||
|
||||
- `integrations` can be a plain string array when referencing pre-configured servers
|
||||
- `provider_info.type` will be `"plugin"` (vs `"ephemeral_mcp"` for inline)
|
||||
|
||||
## Restricting Tool Access
|
||||
|
||||
Use `allowed_tools` on either integration type:
|
||||
|
||||
```json
|
||||
"allowed_tools": ["model_search"]
|
||||
```
|
||||
|
||||
- Limits which tools the model can call from that server
|
||||
- Speeds up prompt processing — fewer tool definitions in context
|
||||
- If omitted, all server tools are available
|
||||
|
||||
## Custom Headers (Ephemeral)
|
||||
|
||||
For authenticated remote MCP endpoints:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "ephemeral_mcp",
|
||||
"server_label": "huggingface",
|
||||
"server_url": "https://huggingface.co/mcp",
|
||||
"allowed_tools": ["model_search"],
|
||||
"headers": {
|
||||
"Authorization": "Bearer <YOUR_HF_TOKEN>"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- LM Studio exposes MCP tool calling through its native `/api/v1/chat` endpoint (not the OpenAI-compat route)
|
||||
- Two modes: **ephemeral** (inline, per-request) vs **mcp.json** (pre-configured, recommended for local/frequent servers)
|
||||
- `allowed_tools` works on both modes — use it to reduce context size and restrict scope
|
||||
- Tool call results appear inline in the `output` array alongside `reasoning` and `message` entries
|
||||
- Auth headers for remote MCP servers go in the `headers` field on ephemeral integrations
|
||||
- The [[wiki/claude-code/lmstudio-responses-api|Responses API]] also supports Remote MCP via `tools` — different endpoint, same concept
|
||||
|
||||
## Related
|
||||
|
||||
- [[wiki/claude-code/lmstudio-responses-api|LM Studio Responses API]] — `/v1/responses` endpoint also supports Remote MCP tools
|
||||
- [[wiki/claude-code/lmstudio-tool-use|LM Studio Tool Use]] — function calling (non-MCP) patterns
|
||||
- [[wiki/claude-code/lmstudio-server-settings|LM Studio Server Settings]] — toggle per-request MCPs and mcp.json access in the UI
|
||||
- [[wiki/claude-code/mcp-integration|Claude Code MCP Integration]] — MCP concepts: transports, scopes, OAuth
|
||||
|
||||
## Sources
|
||||
|
||||
- `raw/Using MCP via API.md` — LM Studio docs, 2026-04-30
|
||||
120
wiki/claude-code/lmstudio-messages-api.md
Normal file
120
wiki/claude-code/lmstudio-messages-api.md
Normal file
|
|
@ -0,0 +1,120 @@
|
|||
---
|
||||
title: "LM Studio — Anthropic Messages API"
|
||||
aliases: [lmstudio-messages, lm-studio-anthropic-messages]
|
||||
tags: [lmstudio, anthropic, api, messages, local-llm, streaming, tools]
|
||||
sources: [raw/Messages.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio — Anthropic Messages API
|
||||
|
||||
The `/v1/messages` endpoint in LM Studio mirrors the Anthropic Messages API exactly — same request shape, same response shape. Use it as a local drop-in for any code already calling Anthropic's cloud API.
|
||||
|
||||
## Endpoint
|
||||
|
||||
```
|
||||
POST http://localhost:1234/v1/messages
|
||||
```
|
||||
|
||||
Required headers:
|
||||
- `Content-Type: application/json`
|
||||
- `x-api-key: $LM_API_TOKEN` — optional if **Require Authentication** is disabled in LM Studio
|
||||
|
||||
## Basic Request
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "x-api-key: $LM_API_TOKEN" \
|
||||
-d '{
|
||||
"model": "ibm/granite-4-micro",
|
||||
"max_tokens": 256,
|
||||
"messages": [
|
||||
{"role": "user", "content": "Say hello from LM Studio."}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
## Streaming
|
||||
|
||||
Add `"stream": true` to receive Server-Sent Events (SSE):
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "x-api-key: $LM_API_TOKEN" \
|
||||
-d '{
|
||||
"model": "ibm/granite-4-micro",
|
||||
"messages": [{"role": "user", "content": "Hello"}],
|
||||
"max_tokens": 256,
|
||||
"stream": true
|
||||
}'
|
||||
```
|
||||
|
||||
SSE event sequence:
|
||||
1. `message_start`
|
||||
2. `content_block_start`
|
||||
3. `content_block_delta` (repeating)
|
||||
4. `content_block_stop`
|
||||
5. `message_delta`
|
||||
6. `message_stop`
|
||||
|
||||
## Tool Use
|
||||
|
||||
Pass a `tools` array with JSON Schema input definitions and a `tool_choice` policy:
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "x-api-key: $LM_API_TOKEN" \
|
||||
-d '{
|
||||
"model": "ibm/granite-4-micro",
|
||||
"max_tokens": 1024,
|
||||
"tools": [
|
||||
{
|
||||
"name": "get_weather",
|
||||
"description": "Get the current weather in a given location",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state, e.g. San Francisco, CA"
|
||||
}
|
||||
},
|
||||
"required": ["location"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"tool_choice": {"type": "any"},
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is the weather like in San Francisco?"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
`tool_choice` options (Anthropic-compat): `"auto"`, `"any"`, `{"type": "tool", "name": "…"}`.
|
||||
|
||||
## Authentication
|
||||
|
||||
| Scenario | Header needed |
|
||||
|----------|---------------|
|
||||
| Auth disabled in LM Studio | No `x-api-key` required |
|
||||
| Auth enabled | `x-api-key: $LM_API_TOKEN` |
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- `POST /v1/messages` on `localhost:1234` is a drop-in for `api.anthropic.com/v1/messages`
|
||||
- Same request body — swap the base URL and optionally add `x-api-key`
|
||||
- Streaming uses standard Anthropic SSE event names — existing stream parsers work unchanged
|
||||
- Tool use with `input_schema` / `tool_choice` is supported
|
||||
- Auth header is optional when LM Studio's **Require Authentication** is off
|
||||
- See [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]] for redirecting the full Anthropic SDK via env vars
|
||||
|
||||
## Related
|
||||
|
||||
- [[wiki/claude-code/lmstudio-anthropic-compat|LM Studio Anthropic Compat Setup]] — redirect Claude Code / SDK to local server
|
||||
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — OpenAI-compatible `/v1/chat/completions`
|
||||
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native v1 endpoints and feature comparison table
|
||||
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management for loaded models
|
||||
86
wiki/claude-code/lmstudio-openai-compat-endpoints.md
Normal file
86
wiki/claude-code/lmstudio-openai-compat-endpoints.md
Normal file
|
|
@ -0,0 +1,86 @@
|
|||
---
|
||||
title: "LM Studio — OpenAI Compatibility Endpoints"
|
||||
aliases: [lmstudio-openai-compat, lmstudio-oai-endpoints]
|
||||
tags: [lmstudio, openai, local-llm, api, embeddings, chat-completions]
|
||||
sources: [raw/OpenAI Compatibility Endpoints.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio — OpenAI Compatibility Endpoints
|
||||
|
||||
LM Studio exposes an OpenAI-compatible HTTP server. Any existing OpenAI client (Python, TypeScript, cURL, C#, etc.) works against it by changing only the **base URL**.
|
||||
|
||||
Default port: `1234`.
|
||||
|
||||
## Supported Endpoints
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `/v1/models` | GET | List loaded/available models |
|
||||
| `/v1/responses` | POST | Responses API (Codex-compatible) |
|
||||
| `/v1/chat/completions` | POST | Chat with text and images |
|
||||
| `/v1/embeddings` | POST | Generate text embeddings |
|
||||
| `/v1/completions` | POST | Legacy completions |
|
||||
|
||||
## Switching Base URL
|
||||
|
||||
Only one line changes — the `base_url` / `baseUrl` property.
|
||||
|
||||
### Python
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:1234/v1"
|
||||
)
|
||||
# rest of your code unchanged
|
||||
```
|
||||
|
||||
### TypeScript
|
||||
|
||||
```typescript
|
||||
import OpenAI from 'openai';
|
||||
|
||||
const client = new OpenAI({
|
||||
baseUrl: "http://localhost:1234/v1"
|
||||
});
|
||||
```
|
||||
|
||||
### cURL
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "<model-identifier-from-lmstudio>",
|
||||
"messages": [{"role": "user", "content": "Say this is a test!"}],
|
||||
"temperature": 0.7
|
||||
}'
|
||||
```
|
||||
|
||||
## Codex Support
|
||||
|
||||
LM Studio supports OpenAI Codex via the `POST /v1/responses` endpoint — the same one Codex targets.
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- **Drop-in replacement** — swap `base_url` to `http://localhost:1234/v1`; no other code changes needed
|
||||
- **Five endpoints** — models, responses, chat/completions, embeddings, legacy completions
|
||||
- **No API key required** by default (LM Studio runs locally)
|
||||
- **Codex works** because LM Studio implements `/v1/responses`
|
||||
- **Model IDs differ** — use the model identifier shown in LM Studio, not OpenAI slugs like `gpt-4o`
|
||||
- For richer stats (token/s, TTFT, model lifecycle) use the [[wiki/claude-code/lmstudio-rest-api|native LM Studio REST API]] instead
|
||||
|
||||
## Related Articles
|
||||
|
||||
- [[wiki/claude-code/lmstudio-anthropic-compat|Anthropic Compat Endpoints]] — `/v1/messages` drop-in for Claude SDK
|
||||
- [[wiki/claude-code/lmstudio-chat-completions|Chat Completions]] — full param reference for `/v1/chat/completions`
|
||||
- [[wiki/claude-code/lmstudio-embeddings|Embeddings]] — `/v1/embeddings` for local RAG pipelines
|
||||
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native v1 API with extended model metadata
|
||||
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management for loaded models
|
||||
|
||||
## Sources
|
||||
|
||||
- [LM Studio OpenAI Compat Docs](https://lmstudio.ai/docs/developer/openai-compat) — raw/OpenAI Compatibility Endpoints.md
|
||||
124
wiki/claude-code/lmstudio-responses-api.md
Normal file
124
wiki/claude-code/lmstudio-responses-api.md
Normal file
|
|
@ -0,0 +1,124 @@
|
|||
---
|
||||
title: "LM Studio Responses API"
|
||||
aliases: [lmstudio-responses, lm-studio-openai-responses]
|
||||
tags: [lm-studio, openai-compat, responses-api, streaming, mcp, reasoning]
|
||||
sources: [raw/Responses.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio Responses API
|
||||
|
||||
LM Studio exposes `/v1/responses` — an OpenAI Responses API-compatible endpoint with support for streaming, reasoning effort, stateful multi-turn via `previous_response_id`, and Remote MCP tools.
|
||||
|
||||
Base URL: `http://localhost:1234/v1/responses`
|
||||
|
||||
---
|
||||
|
||||
## Basic Request (non-streaming)
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/v1/responses \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "openai/gpt-oss-20b",
|
||||
"input": "Provide a prime number less than 50",
|
||||
"reasoning": { "effort": "low" }
|
||||
}'
|
||||
```
|
||||
|
||||
- `input` — plain string prompt (no messages array required)
|
||||
- `reasoning.effort` — `"low"` | `"medium"` | `"high"` (model-dependent)
|
||||
|
||||
---
|
||||
|
||||
## Stateful Follow-up
|
||||
|
||||
Carry conversation state across calls using `previous_response_id`:
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/v1/responses \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "openai/gpt-oss-20b",
|
||||
"input": "Multiply it by 2",
|
||||
"previous_response_id": "resp_123"
|
||||
}'
|
||||
```
|
||||
|
||||
- The `id` field from any prior response becomes the `previous_response_id` of the next
|
||||
- No need to replay the full message history client-side
|
||||
|
||||
---
|
||||
|
||||
## Streaming
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/v1/responses \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "openai/gpt-oss-20b",
|
||||
"input": "Hello",
|
||||
"stream": true
|
||||
}'
|
||||
```
|
||||
|
||||
SSE events emitted:
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `response.created` | Response object initialised |
|
||||
| `response.output_text.delta` | Incremental text chunk |
|
||||
| `response.completed` | Final event, full response included |
|
||||
|
||||
---
|
||||
|
||||
## Remote MCP Tools (opt-in)
|
||||
|
||||
Enable in LM Studio: **Developer → Settings → Remote MCP**.
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/v1/responses \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "ibm/granite-4-micro",
|
||||
"input": "What is the top trending model on hugging face?",
|
||||
"tools": [
|
||||
{
|
||||
"type": "mcp",
|
||||
"server_label": "huggingface",
|
||||
"server_url": "https://huggingface.co/mcp",
|
||||
"allowed_tools": ["model_search"]
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
- `server_label` — arbitrary identifier for this MCP server
|
||||
- `server_url` — remote MCP server URL
|
||||
- `allowed_tools` — allowlist of tool names the model may call
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- `/v1/responses` is an OpenAI Responses API drop-in; swap base URL only
|
||||
- `previous_response_id` enables multi-turn without replaying history — simpler than maintaining a messages array
|
||||
- Streaming uses standard SSE; listen for `response.output_text.delta` for incremental chunks
|
||||
- Remote MCP tools are per-request and opt-in — must enable the feature in LM Studio settings first
|
||||
- `reasoning.effort` controls thinking depth; not all models support it
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — overview of all 5 OAI-compatible endpoints
|
||||
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — `/v1/chat/completions` with full param reference
|
||||
- [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — `/v1/messages` Anthropic-compat with streaming + tool-use
|
||||
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native endpoint feature comparison table
|
||||
- [[wiki/claude-code/mcp-integration|MCP Integration]] — Claude Code MCP setup and server patterns
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
- `raw/Responses.md` — LM Studio developer docs: `/v1/responses` endpoint
|
||||
75
wiki/claude-code/lmstudio-rest-api.md
Normal file
75
wiki/claude-code/lmstudio-rest-api.md
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
---
|
||||
title: "LM Studio REST API (v1)"
|
||||
aliases: [lmstudio-api, lm-studio-rest, lmstudio-v1]
|
||||
tags: [lmstudio, rest-api, local-inference, openai-compat, anthropic-compat, mcp]
|
||||
sources: [raw/LM Studio API.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio REST API (v1)
|
||||
|
||||
LM Studio 0.4.0 introduced the native **v1 REST API** at `/api/v1/*`. It sits alongside OpenAI-compatible and Anthropic-compatible endpoints and offers the richest feature set for local inference.
|
||||
|
||||
## v1 vs v0
|
||||
|
||||
The old v0 API (`/api/v0/*`) is superseded. Migrate to `/api/v1/*` for:
|
||||
|
||||
- **Stateful chats** — server keeps conversation context across turns
|
||||
- **MCP via API** — use MCPs configured in LM Studio directly from requests
|
||||
- **Authentication** — API token support
|
||||
- **Model management** — download, load, unload via API
|
||||
|
||||
## Supported Endpoints
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|---|---|---|
|
||||
| `/api/v1/chat` | POST | Inference (native) |
|
||||
| `/api/v1/models` | GET | List loaded models |
|
||||
| `/api/v1/models/load` | POST | Load a model into VRAM |
|
||||
| `/api/v1/models/unload` | POST | Unload a model |
|
||||
| `/api/v1/models/download` | POST | Download a model |
|
||||
| `/api/v1/models/download/status` | GET | Poll download progress |
|
||||
|
||||
## Inference Endpoint Comparison
|
||||
|
||||
Four endpoints can run inference. Pick based on which features you need:
|
||||
|
||||
| Feature | `/api/v1/chat` | `/v1/responses` (OAI) | `/v1/chat/completions` (OAI) | `/v1/messages` (Anthropic) |
|
||||
|---|:---:|:---:|:---:|:---:|
|
||||
| Streaming | ✅ | ✅ | ✅ | ✅ |
|
||||
| Stateful chat | ✅ | ✅ | ❌ | ❌ |
|
||||
| Remote MCPs | ✅ | ✅ | ❌ | ❌ |
|
||||
| LM Studio MCPs | ✅ | ✅ | ❌ | ❌ |
|
||||
| Custom tools | ❌ | ✅ | ✅ | ✅ |
|
||||
| Assistant messages in request | ❌ | ✅ | ✅ | ✅ |
|
||||
| Model load streaming events | ✅ | ❌ | ❌ | ❌ |
|
||||
| Prompt processing events | ✅ | ❌ | ❌ | ❌ |
|
||||
| Specify context length | ✅ | ❌ | ❌ | ❌ |
|
||||
|
||||
**Decision guide:**
|
||||
- Need MCP tools + stateful chat → `/api/v1/chat` or `/v1/responses`
|
||||
- Need custom tool definitions → `/v1/responses`, `/v1/chat/completions`, or `/v1/messages`
|
||||
- Dropping in existing OpenAI SDK code → `/v1/chat/completions`
|
||||
- Dropping in existing Anthropic SDK code → `/v1/messages`
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- The **native `/api/v1/chat`** endpoint has exclusive features: stateful chat, LM Studio MCPs, model-load events, prompt-processing events, and per-request context length.
|
||||
- **`/v1/responses`** (OpenAI Responses API compat) is the best of both worlds — stateful + MCP + custom tools.
|
||||
- **`/v1/chat/completions`** is the broadest drop-in for existing OpenAI code but loses statefulness and MCP.
|
||||
- **`/v1/messages`** lets you redirect the Anthropic SDK to a local model with minimal code change (see [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]]).
|
||||
- Model management endpoints let you fully automate the model lifecycle — download → load → infer → unload — without touching the GUI.
|
||||
- API token auth is available for securing the local server (useful when exposed on a LAN).
|
||||
|
||||
## Related Articles
|
||||
|
||||
- [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]] — redirect Claude Code / Anthropic SDK to LM Studio via env vars
|
||||
- [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] — OpenAI `/v1/chat/completions` usage, params, debugging
|
||||
- [[wiki/claude-code/lmstudio-embeddings|lmstudio-embeddings]] — `/v1/embeddings` for local RAG pipelines
|
||||
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|lmstudio-idle-ttl-auto-evict]] — memory management: TTL and auto-evict
|
||||
- [[wiki/agent-sdk/overview|agent-sdk/overview]] — build multi-agent systems that call local models
|
||||
|
||||
## Sources
|
||||
|
||||
- `raw/LM Studio API.md` — clipped from lmstudio.ai/docs/developer/rest
|
||||
54
wiki/claude-code/lmstudio-serve-on-network.md
Normal file
54
wiki/claude-code/lmstudio-serve-on-network.md
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
---
|
||||
title: "LM Studio — Serve on Local Network"
|
||||
aliases: [lmstudio-network, lmstudio-lan-server]
|
||||
tags: [lmstudio, networking, api-server, local-llm, lan]
|
||||
sources: [raw/Serve on Local Network.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio — Serve on Local Network
|
||||
|
||||
Enabling **Serve on Local Network** makes the LM Studio API server accessible to other devices on the same LAN — not just `localhost`.
|
||||
|
||||
## How It Works
|
||||
|
||||
- By default the server binds to `127.0.0.1` (localhost only)
|
||||
- With the option enabled it binds to your machine's **local network IP** (e.g. `192.168.x.x`)
|
||||
- The API access URL shown in LM Studio updates to reflect the new binding
|
||||
- All existing API endpoints stay the same — only the host changes
|
||||
|
||||
## Use Cases
|
||||
|
||||
| Scenario | Why useful |
|
||||
|----------|-----------|
|
||||
| Thin-client devices (laptop, tablet, phone) | Offload inference to a powerful desktop on the same network |
|
||||
| Shared team access | Multiple people hit one LM Studio instance |
|
||||
| IoT / edge devices | Raspberry Pi or similar calls the API over LAN |
|
||||
| Local service mesh | Other self-hosted services (Home Assistant, scripts) consume the LLM |
|
||||
|
||||
## Setup Steps
|
||||
|
||||
1. Open LM Studio → **Local Server** tab
|
||||
2. Toggle **Serve on Local Network** → ON
|
||||
3. Note the updated **API access URL** displayed (e.g. `http://192.168.1.x:1234`)
|
||||
4. On client devices, point `base_url` to that address instead of `http://localhost:1234`
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- One toggle — no firewall rule changes required on most home routers (LAN-to-LAN is open by default)
|
||||
- The API surface is identical to localhost; only the bind address differs
|
||||
- Useful when pairing a powerful homelab machine with weaker clients — see [[wiki/homelab/_index|homelab]] for server options
|
||||
- Combine with [[wiki/claude-code/lmstudio-headless-service|lmstudio-headless-service]] to run the server without the GUI on a headless machine
|
||||
- For redirecting Claude Code itself to the local server, see [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]]
|
||||
|
||||
## Related
|
||||
|
||||
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — full endpoint reference
|
||||
- [[wiki/claude-code/lmstudio-headless-service|LM Studio Headless Service]] — run without GUI (daemon mode)
|
||||
- [[wiki/claude-code/lmstudio-anthropic-compat|Anthropic Compat Endpoints]] — point Claude Code at local server
|
||||
- [[wiki/homelab/_index|Homelab]] — self-hosted hardware for running LM Studio
|
||||
|
||||
## Sources
|
||||
|
||||
- `raw/Serve on Local Network.md` — clipped from lmstudio.ai/docs/developer/core/server/serve-on-network
|
||||
62
wiki/claude-code/lmstudio-server-settings.md
Normal file
62
wiki/claude-code/lmstudio-server-settings.md
Normal file
|
|
@ -0,0 +1,62 @@
|
|||
---
|
||||
title: "LM Studio Server Settings"
|
||||
aliases: [lmstudio-server-config, lm-studio-api-server-settings]
|
||||
tags: [lmstudio, api-server, configuration, mcp, jit, cors, auth]
|
||||
sources: [raw/Server Settings.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio Server Settings
|
||||
|
||||
Configuration options for the LM Studio API server — accessible from the LM Studio UI or `lms` CLI. Controls port, auth, network access, MCP permissions, CORS, and JIT model memory management.
|
||||
|
||||
## Network & Access
|
||||
|
||||
| Setting | Type | Description |
|
||||
|---------|------|-------------|
|
||||
| **Server Port** | Integer | Port the API server listens on (default `1234`) |
|
||||
| **Serve on Local Network** | Switch | Binds server to LAN IP so other devices can reach it — see [[wiki/claude-code/lmstudio-serve-on-network\|Serve on Network]] |
|
||||
| **Enable CORS** | Switch | Allow cross-origin requests (needed for browser-based clients hitting a local server) |
|
||||
|
||||
## Authentication
|
||||
|
||||
| Setting | Type | Description |
|
||||
|---------|------|-------------|
|
||||
| **Require Authentication** | Switch | Clients must pass a valid token in `Authorization` header — see [[wiki/claude-code/lmstudio-anthropic-compat\|LM Studio Auth docs]] |
|
||||
|
||||
> Authentication is a prerequisite for enabling MCP server access from `mcp.json`.
|
||||
|
||||
## MCP (Model Context Protocol)
|
||||
|
||||
| Setting | Type | Description |
|
||||
|---------|------|-------------|
|
||||
| **Allow per-request MCPs** | Switch | Clients may specify ephemeral remote MCP servers in individual requests (not in `mcp.json`). Only remote MCPs supported. |
|
||||
| **Allow calling servers from mcp.json** | Switch | Clients may use MCP servers defined in your LM Studio `mcp.json`. **Requires Auth enabled.** Security risk if those servers have filesystem/data access. |
|
||||
|
||||
Related: [[wiki/claude-code/mcp-integration\|MCP Integration]]
|
||||
|
||||
## JIT (Just-in-Time) Model Loading
|
||||
|
||||
Saves RAM by loading models on demand rather than pre-loading them.
|
||||
|
||||
| Setting | Type | Description |
|
||||
|---------|------|-------------|
|
||||
| **Just in Time Model Loading** | Switch | Load a model at request time if not already loaded |
|
||||
| **Auto Unload Unused JIT Models** | Switch | Automatically evict JIT models when idle |
|
||||
| **Only Keep Last JIT Loaded Model** | Switch | Evict all but the most recently used JIT model — minimizes RAM usage |
|
||||
|
||||
> For deeper JIT / TTL / eviction behavior, see [[wiki/claude-code/lmstudio-idle-ttl-auto-evict\|Idle TTL and Auto-Evict]].
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- **Port** is the only integer setting; all others are on/off switches.
|
||||
- **Auth is a gate** — `mcp.json` server access won't work without it enabled.
|
||||
- **Per-request MCPs** are ephemeral and remote-only; they don't persist after the request.
|
||||
- **CORS** must be on for any browser app (web UI, local HTML tool) to call the API.
|
||||
- **JIT trio** (`JIT Load` → `Auto Unload` → `Only Keep Last`) progressively tightens memory: enable all three on low-RAM machines.
|
||||
- LAN access via [[wiki/claude-code/lmstudio-serve-on-network\|Serve on Network]] is a separate setting from CORS — you may need both.
|
||||
|
||||
## Sources
|
||||
|
||||
- `raw/Server Settings.md` — scraped from [lmstudio.ai/docs/developer/core/server/settings](https://lmstudio.ai/docs/developer/core/server/settings)
|
||||
150
wiki/claude-code/lmstudio-structured-output.md
Normal file
150
wiki/claude-code/lmstudio-structured-output.md
Normal file
|
|
@ -0,0 +1,150 @@
|
|||
---
|
||||
title: "LM Studio Structured Output"
|
||||
aliases: [lmstudio-json-schema, structured-output-lmstudio]
|
||||
tags: [lmstudio, structured-output, json-schema, openai-compat, local-llm]
|
||||
sources: [raw/Structured Output.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio Structured Output
|
||||
|
||||
Enforce a specific JSON shape on LLM responses by passing a JSON schema to `/v1/chat/completions`. Compatible with OpenAI's Structured Output API format.
|
||||
|
||||
## How It Works
|
||||
|
||||
- Add a `response_format` field to the chat completions request
|
||||
- Provide a `json_schema` with a `name`, optional `strict`, and a `schema` object
|
||||
- The model is constrained to return valid JSON matching that schema
|
||||
- Response arrives as a string in `choices[0].message.content` — parse it with `json.loads()`
|
||||
|
||||
## Server Setup
|
||||
|
||||
```bash
|
||||
lms server start
|
||||
# or enable from Developer tab in LM Studio UI
|
||||
```
|
||||
|
||||
Install the CLI first if needed:
|
||||
```bash
|
||||
npx lmstudio install-cli
|
||||
```
|
||||
|
||||
## request_format Shape
|
||||
|
||||
```json
|
||||
"response_format": {
|
||||
"type": "json_schema",
|
||||
"json_schema": {
|
||||
"name": "my_schema",
|
||||
"strict": "true",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"field": { "type": "string" }
|
||||
},
|
||||
"required": ["field"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## cURL Example
|
||||
|
||||
```bash
|
||||
curl http://localhost:1234/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "{{model}}",
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are a helpful jokester."},
|
||||
{"role": "user", "content": "Tell me a joke."}
|
||||
],
|
||||
"response_format": {
|
||||
"type": "json_schema",
|
||||
"json_schema": {
|
||||
"name": "joke_response",
|
||||
"strict": "true",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": { "joke": {"type": "string"} },
|
||||
"required": ["joke"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"temperature": 0.7,
|
||||
"max_tokens": 50,
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
|
||||
## Python Example
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
import json
|
||||
|
||||
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
|
||||
|
||||
character_schema = {
|
||||
"type": "json_schema",
|
||||
"json_schema": {
|
||||
"name": "characters",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"characters": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {"type": "string"},
|
||||
"occupation": {"type": "string"},
|
||||
"personality": {"type": "string"},
|
||||
"background": {"type": "string"}
|
||||
},
|
||||
"required": ["name", "occupation", "personality", "background"]
|
||||
},
|
||||
"minItems": 1
|
||||
}
|
||||
},
|
||||
"required": ["characters"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="your-model",
|
||||
messages=[
|
||||
{"role": "system", "content": "You are a helpful AI assistant."},
|
||||
{"role": "user", "content": "Create 1-3 fictional characters"}
|
||||
],
|
||||
response_format=character_schema,
|
||||
)
|
||||
|
||||
results = json.loads(response.choices[0].message.content)
|
||||
print(json.dumps(results, indent=2))
|
||||
```
|
||||
|
||||
## Structured Output Engines
|
||||
|
||||
| Model Format | Engine |
|
||||
|---|---|
|
||||
| GGUF | `llama.cpp` grammar-based sampling |
|
||||
| MLX | [Outlines](https://github.com/dottxt-ai/outlines) via [lmstudio-ai/mlx-engine](https://github.com/lmstudio-ai/mlx-engine) |
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- Use `response_format.type = "json_schema"` — same shape as OpenAI's Structured Outputs API
|
||||
- Works with any OpenAI-compatible client SDK (Python, TS, etc.) just by pointing `base_url` at localhost
|
||||
- Response is always a **string** in `choices[0].message.content` — always call `json.loads()` on it
|
||||
- Not all models support this: **models below 7B parameters often cannot do structured output** — check the model card
|
||||
- GGUF uses grammar sampling; MLX uses Outlines — both constrain tokens at generation time, not post-hoc
|
||||
- All standard `/v1/chat/completions` params (temperature, max_tokens, stream, etc.) still apply
|
||||
|
||||
## Related
|
||||
|
||||
- [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] — full parameter reference for the completions endpoint
|
||||
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|lmstudio-openai-compat-endpoints]] — overview of all OpenAI-compat endpoints
|
||||
- [[wiki/claude-code/lmstudio-responses-api|lmstudio-responses-api]] — stateful responses with streaming and Remote MCP tools
|
||||
- [[wiki/claude-code/lmstudio-rest-api|lmstudio-rest-api]] — native LM Studio API and endpoint feature comparison
|
||||
158
wiki/claude-code/lmstudio-tool-use.md
Normal file
158
wiki/claude-code/lmstudio-tool-use.md
Normal file
|
|
@ -0,0 +1,158 @@
|
|||
---
|
||||
title: "LM Studio Tool Use (Function Calling)"
|
||||
aliases: [lmstudio-function-calling, lmstudio-tools]
|
||||
tags: [lmstudio, tool-use, function-calling, openai-compat, python, local-llm]
|
||||
sources: [raw/Tool Use.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# LM Studio Tool Use (Function Calling)
|
||||
|
||||
Tool use lets LLMs *request* calls to external functions/APIs via LM Studio's OpenAI-compatible `/v1/chat/completions` and `/v1/responses` endpoints. Your code executes the actual functions and feeds results back.
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- LLMs **cannot execute code** — they output structured text requesting a tool call; your code runs it
|
||||
- Uses the same format as OpenAI's Function Calling API — any OpenAI SDK works
|
||||
- Tool definitions are injected into the system prompt via the model's chat template
|
||||
- Two support tiers: **Native** (model trained for tool use) and **Default** (fallback prompt injection)
|
||||
- After tool execution, re-prompt the model *without* tools to get a plain-text final answer
|
||||
- Streaming tool calls arrive in chunks — accumulate `delta.tool_calls` before executing
|
||||
|
||||
## High-Level Flow
|
||||
|
||||
```
|
||||
Setup LLM + tool list
|
||||
→ Get user input
|
||||
→ LLM prompted with messages
|
||||
→ Needs tools?
|
||||
Yes → Tool Response → Execute tools → Add results to messages → re-prompt
|
||||
No → Normal response → loop back
|
||||
```
|
||||
|
||||
## Tool Definition Format
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_delivery_date",
|
||||
"description": "Get the delivery date for a customer's order",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"order_id": { "type": "string" }
|
||||
},
|
||||
"required": ["order_id"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Pass as the `tools` array in the request body — identical to OpenAI's spec.
|
||||
|
||||
## Response Parsing
|
||||
|
||||
- Tool call detected: `choices[0].message.tool_calls` array is populated; `finish_reason = "tool_calls"`
|
||||
- No tool call: response lands in `choices[0].message.content` as normal text
|
||||
- If the model outputs a malformed tool call, LM Studio falls back to `content` — use `lms log stream` to debug
|
||||
|
||||
## Multi-Turn Pattern (Python)
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
import json
|
||||
|
||||
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
|
||||
|
||||
# 1. First call — with tools
|
||||
response = client.chat.completions.create(
|
||||
model="lmstudio-community/qwen2.5-7b-instruct",
|
||||
messages=messages,
|
||||
tools=tools,
|
||||
)
|
||||
|
||||
# 2. Execute the requested tool
|
||||
tool_call = response.choices[0].message.tool_calls[0]
|
||||
args = json.loads(tool_call.function.arguments)
|
||||
result = my_function(**args)
|
||||
|
||||
# 3. Append both the assistant's tool-call message and the tool result
|
||||
messages += [
|
||||
{"role": "assistant", "tool_calls": [tool_call]},
|
||||
{"role": "tool", "content": json.dumps(result), "tool_call_id": tool_call.id},
|
||||
]
|
||||
|
||||
# 4. Second call — WITHOUT tools for final plain-text answer
|
||||
final = client.chat.completions.create(model=model, messages=messages)
|
||||
print(final.choices[0].message.content)
|
||||
```
|
||||
|
||||
## Native vs Default Support
|
||||
|
||||
| Level | What it means | Quality |
|
||||
|-------|---------------|---------|
|
||||
| **Native** | Model has a tool-use chat template + LM Studio parses its format | Best |
|
||||
| **Default** | LM Studio injects a custom system prompt + converts `tool` role to `user` | Variable |
|
||||
|
||||
### Models with Native Support (as of 2024-11)
|
||||
|
||||
- **Qwen** — Qwen2.5-7B-Instruct (GGUF / MLX)
|
||||
- **Llama** — Llama-3.1 / 3.2 8B-Instruct (GGUF / MLX)
|
||||
- **Mistral** — Ministral-8B-Instruct-2410 (GGUF / MLX)
|
||||
|
||||
Native models show a hammer badge in the LM Studio UI.
|
||||
|
||||
## Streaming Tool Calls
|
||||
|
||||
```python
|
||||
# Accumulate chunks — name and arguments arrive in pieces
|
||||
for chunk in stream:
|
||||
delta = chunk.choices[0].delta
|
||||
if delta.tool_calls:
|
||||
for tc in delta.tool_calls:
|
||||
# Append tc.id, tc.function.name, tc.function.arguments fragments
|
||||
```
|
||||
|
||||
Execute only after the stream ends and `tool_calls` is fully assembled.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Start server
|
||||
lms server start
|
||||
|
||||
# Load a model
|
||||
lms load
|
||||
|
||||
# Debug raw prompts (see how tools are injected)
|
||||
lms log stream
|
||||
```
|
||||
|
||||
```bash
|
||||
# curl single-turn example
|
||||
curl http://localhost:1234/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "lmstudio-community/qwen2.5-7b-instruct",
|
||||
"messages": [{"role": "user", "content": "Search dell products under $50"}],
|
||||
"tools": [...]}'
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **No `tool_calls` in response** — model output was malformed; run `lms log stream` to inspect the raw prompt and output
|
||||
- **Smaller models** — may not follow the tool call format reliably; prefer ≥7B models with native support
|
||||
- **Default mode weirdness** — check the injected system prompt via `lms log stream`; the format uses `[TOOL_REQUEST]...[END_TOOL_REQUEST]` tags
|
||||
|
||||
## Related
|
||||
|
||||
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — full `/v1/chat/completions` param reference
|
||||
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — all 5 compatible endpoints
|
||||
- [[wiki/claude-code/lmstudio-responses-api|LM Studio Responses API]] — `/v1/responses` with Remote MCP tools
|
||||
- [[wiki/claude-code/lmstudio-structured-output|LM Studio Structured Output]] — enforce JSON schema on responses
|
||||
- [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — Anthropic-compat tool use examples
|
||||
|
||||
## Sources
|
||||
|
||||
- `raw/Tool Use.md` — LM Studio official docs (lmstudio.ai/docs/developer/openai-compat/tools), published 2024-11-19
|
||||
|
|
@ -43,3 +43,4 @@ Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, bud
|
|||
| [[wiki/homelab/glance-dashboard\|Glance — Self-hosted Dashboard]] | Glance setup replacing Homarr: Docker config, 5-page layout, Prometheus RAPL metrics, key patterns ($include caveat, internal IPs only) | session 2026-04-29 | 2026-04-29 |
|
||||
| [[wiki/homelab/homelab-media-stack\|Homelab Media Stack — Jellyfin + *arr + qBittorrent Setup]] | CT111 media LXC: unified /data mount pattern, Intel QuickSync GPU passthrough, step-by-step qBittorrent categories + Sonarr/Radarr/Prowlarr wiring | session 2026-04-26 | 2026-04-26 |
|
||||
| [[wiki/homelab/hp-elitedesk-800g3-proxmox\|HP Elitedesk 800 G3 — Proxmox Setup Log]] | Real homelab server setup log: i5-7500, 24 GB RAM, 256 GB NVMe + 6 TB HDD, LXC containers, GPU passthrough (AMD/Intel) | session 2026-04-18 | 2026-04-21 |
|
||||
| [[wiki/homelab/hp-elitedesk-800g3-teardown-upgrade\|HP EliteDesk 800 G3 SFF — Teardown, Upgrade & Benchmarks]] | Full disassembly/reassembly guide: proprietary connectors caveat, dual-channel RAM, CPU cooler swap, GTX 1050 Ti, thermal benchmarks (GTA V, Flight Sim) | raw/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md | 2026-04-30 |
|
||||
|
|
|
|||
153
wiki/homelab/hp-elitedesk-800g3-teardown-upgrade.md
Normal file
153
wiki/homelab/hp-elitedesk-800g3-teardown-upgrade.md
Normal file
|
|
@ -0,0 +1,153 @@
|
|||
---
|
||||
title: "HP EliteDesk 800 G3 SFF — Teardown, Upgrade & Benchmarks"
|
||||
aliases: [elitedesk-800-g3-teardown, hp-sff-upgrade-guide]
|
||||
tags: [homelab, hardware, hp, sff, upgrade, benchmark]
|
||||
sources: [raw/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md]
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The HP EliteDesk 800 G3 SFF is a small form factor desktop often available cheaply at auctions. It uses a **proprietary motherboard and PSU connector** — not standard ATX — which limits some upgrade paths but still allows CPU, RAM, SSD, and GPU swaps.
|
||||
|
||||
Reference config (video unit): i7-7700 · GTX 1050 Ti (low-profile) · 16 GB DDR4 · 256 GB NVMe SSD
|
||||
|
||||
---
|
||||
|
||||
## Exterior Ports
|
||||
|
||||
**Front**
|
||||
- 1× USB-C
|
||||
- 2× USB 3.0
|
||||
- 2× USB 2.0
|
||||
- Audio in/out
|
||||
- Power button
|
||||
- Slim optical drive bay
|
||||
- Optional SD-card reader slot
|
||||
|
||||
**Back**
|
||||
- DisplayPort
|
||||
- Flexible port option (VGA/DP/HDMI via option card)
|
||||
- RJ45 (Gigabit)
|
||||
- 2× USB 2.0 + 2× USB 3.0
|
||||
- Power connector
|
||||
- GPU video outputs (from installed card)
|
||||
|
||||
---
|
||||
|
||||
## Motherboard Layout
|
||||
|
||||
Non-standard form factor — not ATX/ITX. Key connectors:
|
||||
|
||||
| Component | Detail |
|
||||
|-----------|--------|
|
||||
| PCIe slots | 1× x16 (GPU), 2× x1, 1× x4 (downshifted) |
|
||||
| RAM slots | 4× DDR4 DIMM — DIMM1/2 = Ch. B, DIMM3/4 = Ch. A |
|
||||
| Storage | 1× M.2 NVMe SSD, 3× SATA, 1× M.2 Wi-Fi |
|
||||
| Power | Proprietary non-standard PSU connector |
|
||||
| Option card | VGA / DisplayPort / HDMI output slot |
|
||||
| CMOS reset | Physical button on board |
|
||||
|
||||
**Proprietary connectors = motherboard and PSU cannot be swapped for generic parts.**
|
||||
|
||||
---
|
||||
|
||||
## Disassembly Procedure
|
||||
|
||||
1. **Open case** — slide latch on top cover, no tools needed
|
||||
2. **Open airflow panel** — provides better access to NVMe, SATA, and RAM
|
||||
3. **Remove CPU cooler cover** (plastic airflow shroud)
|
||||
4. Disconnect and slide out **slim DVD drive** (green latch release)
|
||||
5. Remove **front panel**
|
||||
6. Remove **GPU** (low-profile PCIe card, 4 GB VRAM)
|
||||
7. Disconnect **proprietary power connectors**
|
||||
8. Remove **NVMe SSD** (single retention screw)
|
||||
9. Remove **RAM sticks**
|
||||
10. Unscrew 4 screws → lift **CPU cooler**
|
||||
11. Lift lever → remove **CPU** (LGA 1151)
|
||||
12. Remove **motherboard** from chassis
|
||||
|
||||
---
|
||||
|
||||
## Upgrade Notes
|
||||
|
||||
### RAM — Dual Channel
|
||||
- Use matching DIMMs in **same-colour slots** (one per channel)
|
||||
- For 16 GB: 2× 8 GB — one in Ch. A slot, one in Ch. B slot
|
||||
|
||||
### CPU Cooler Replacement
|
||||
- Stock cooler can develop bearing noise
|
||||
- Replacement must be **PWM 4-pin** type
|
||||
- Heatsink mounts to chassis (not board) — install after board is seated in case
|
||||
- Clean old paste with isopropyl alcohol before applying new thermal paste
|
||||
|
||||
### 3.5" HDD Addition
|
||||
- Install standoff screws on drive
|
||||
- Slide into drive cage
|
||||
- Connect SATA data + power cables
|
||||
|
||||
### GPU (Low-Profile Required)
|
||||
- SFF case requires **low-profile PCIe card**
|
||||
- Tested: Gigabyte GTX 1050 Ti (4 GB VRAM) — fits the x16 slot
|
||||
|
||||
---
|
||||
|
||||
## Reassembly Order
|
||||
|
||||
1. CPU into socket (match orientation notch)
|
||||
2. NVMe SSD → slot + screw
|
||||
3. RAM → correct channel slots
|
||||
4. Motherboard into case
|
||||
5. CPU cooler + thermal paste → fix to chassis
|
||||
6. Connect CPU fan to board
|
||||
7. Airflow cover (clips onto CPU fan)
|
||||
8. Power cables + speaker
|
||||
9. DVD drive + SATA cable
|
||||
10. 3.5" HDD → cage + cables
|
||||
11. GPU → PCIe slot
|
||||
12. SATA data cable for HDD
|
||||
13. Front cover → top cover
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Results (i7-7700 + GTX 1050 Ti)
|
||||
|
||||
| Test | Result |
|
||||
|------|--------|
|
||||
| Geekbench CPU | Expected for i7-7700 generation |
|
||||
| Geekbench Compute (GPU) | Expected for GTX 1050 Ti |
|
||||
| Microsoft Flight Simulator (Medium, 1080p) | ~30 FPS steady |
|
||||
| GTA V (Very High + AA, 1080p) | Consistent 60+ FPS |
|
||||
|
||||
### Thermal Observations
|
||||
- CPU and GPU approach **~90°C** under sustained load (Flight Simulator)
|
||||
- GTA V similarly runs hot
|
||||
- SFF chassis limits airflow — **monitor temps if running sustained workloads**
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- The EliteDesk 800 G3 SFF uses **proprietary PSU and motherboard connectors** — plan upgrades around this constraint
|
||||
- Case opens **tool-free** via a single top-cover latch; very serviceable for the form factor
|
||||
- CPU cooler mounts to the **chassis** not the board — must be installed after the board is seated
|
||||
- Dual-channel RAM requires same-colour DIMM pairing (Ch. A + Ch. B)
|
||||
- GTX 1050 Ti (low-profile) is the practical GPU ceiling for this chassis without a riser
|
||||
- Thermals are borderline under sustained 3D load — consider improved case airflow or undervolting for homelab/compute use
|
||||
- For homelab use (Proxmox, LXCs), thermal load is far lighter — see [[wiki/homelab/hp-elitedesk-800g3-proxmox|HP Elitedesk 800 G3 — Proxmox Setup Log]]
|
||||
|
||||
---
|
||||
|
||||
## Related Articles
|
||||
|
||||
- [[wiki/homelab/hp-elitedesk-800g3-proxmox|HP Elitedesk 800 G3 — Proxmox Setup Log]]
|
||||
- [[wiki/homelab/homelab-from-scratch-budget-build|Homelab From Scratch — Budget-First Design]]
|
||||
- [[wiki/homelab/bigibz1-homelab-hardware|bigibz1 Homelab Hardware Reference]]
|
||||
- [[wiki/homelab/homelab-services-map|Homelab — Full Services Map & Network Reference]]
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
- [YouTube: HP EliteDesk 800 G3 SFF — Teardown, re-assembly and upgrade (jensd_be, 2021-03-08)](https://www.youtube.com/watch?v=n1ETa3mJ85I)
|
||||
Loading…
Add table
Reference in a new issue