vault backup: 2026-04-30 14:42:43

This commit is contained in:
Vadym Samoilenko 2026-04-30 14:42:43 +01:00
parent 1eb1072c19
commit 0631673e44
33 changed files with 1557 additions and 3 deletions

View file

@ -4,7 +4,7 @@
"syncFolder": "Hoarder",
"attachmentsFolder": "Hoarder/attachments",
"syncIntervalMinutes": 60,
"lastSyncTimestamp": 1777555641238,
"lastSyncTimestamp": 1777556035137,
"updateExistingFiles": false,
"excludeArchived": true,
"onlyFavorites": false,

View file

@ -149,3 +149,48 @@ tags: [daily]
- 14:27 (<1min) | `memory-compiler`
- **Asked:** Compile a new article about LM Studio embeddings into the structured wiki knowledge base.
- **Done:** Filed article as `wiki/claude-code/lmstudio-embeddings.md` and updated master index with wikilinks to related LM Studio topics and RAG pattern.
- 14:28 (1min) | `memory-compiler`
- **Asked:** Compile a new HP EliteDesk 800G3 teardown/upgrade article into the wiki knowledge base.
- **Done:** Filed article as `wiki/homelab/hp-elitedesk-800g3-teardown-upgrade.md` with full disassembly procedures, motherboard specs, and upgrade benchmarks.
- 14:29 | `video-accessibility`
- **Asked:** Asked for code review skills checklist from project instructions | Reviewed project completion and committed code changes | No files specified
- **Done:**
- 14:30 (<1min) | `memory-compiler`
- **Asked:** Compile a new article on LM Studio messages API into the wiki knowledge base.
- **Done:** Created structured wiki article with cURL examples and updated topic and master indices.
- 14:31 (<1min) | `memory-compiler`
- **Asked:** Compile a new article about LM Studio's OpenAI-compatible endpoints into the wiki knowledge base.
- **Done:** Created the article, updated the claude-code index to 21 articles, and bumped the master index count.
- 14:32 | `video-accessibility`
- **Asked:** What skills should be checked for code review according to the instructions?
- **Done:** Reviewed project completion and identified environment configuration changes needed for optical-dev deployment.
- 14:33 (<1min) | `memory-compiler`
- **Asked:** Compile a new article about LM Studio headless service into the knowledge base wiki.
- **Done:** Filed article as `claude-code/lmstudio-headless-service.md` and updated master index to reflect 23 total claude-code articles.
- 14:33 (<1min) | `memory-compiler`
- **Asked:** Compile a new article about LM Studio network serving into the knowledge base and update the master index.
- **Done:** Created new LM Studio article, updated claude-code topic index, and incremented master index article count from 23 to 24.
- 14:34 | `video-accessibility`
- **Asked:** Check the project instructions for code review skills requirements.
- **Done:** Identified OOM issue in whisper-worker memory configuration and pushed hotfix to restore original memory limits while keeping Cloud Run URLs.
- 14:35 (<1min) | `memory-compiler`
- **Asked:** Compile a raw article about LM Studio systemd configuration into the structured wiki knowledge base.
- **Done:** Filed the article as a systemd unit configuration guide with systemd service setup details, unit file ordering, and PATH requirements.
- 14:36 (<1min) | `memory-compiler`
- **Asked:** File a new article about LM Studio structured output into the knowledge base.
- **Done:** Created wiki article and updated both index files to register the new entry.
- 14:37 (1min) | `memory-compiler`
- **Asked:** Compile a new article on tool use into the knowledge base wiki structure.
- **Done:** Processed raw article into `claude-code/lmstudio-tool-use.md` and updated both topic and master indexes.
- 14:38 | `video-accessibility`
- **Asked:** Check the instructions for code review skills to verify the completed project.
- **Done:** Reviewed deployment fix that restored memory limits and confirmed all 7 containers started successfully with API health checks passing.
- 14:39 (<1min) | `memory-compiler`
- **Asked:** Compile a new LM Studio CLI article into the knowledge base wiki.
- **Done:** Created structured wiki article with command reference and cross-links, updated master index from 29 to 30 claude-code articles.
- 14:41 | `video-accessibility`
- **Asked:** Check the project instructions for code review skills that need to be verified.
- **Done:** Reviewed deployment status and identified CORS configuration and ffmpeg logging checks needed.
- 14:41 | `video-accessibility`
- **Asked:** Check project completion and review code quality assessment skills from instructions.
- **Done:** Identified server authorization limitations and provided gsutil CORS configuration command for local execution.

View file

@ -26,12 +26,12 @@ This 3-hop pattern works for hundreds of articles without vector search.
| [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 75 |
| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder, Docker DNS+AdGuard | 9 |
| [[wiki/qa/_index\|qa/]] | Filed answers to queries (saved with `--file-back`) | 0 |
| [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 39 |
| [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 40 |
| [[wiki/web-agency/_index\|web-agency/]] | AI-assisted website building & selling: Claude Code, Nanobanana 2, Kling, LaunchPath MCP | 9 |
| [[wiki/dotfiles/_index\|dotfiles/]] | Linux terminal ricing: Kitty, Fish, WezTerm CLI, modern Rust CLI tools, LazyVim, unified themes, Tabby | 21 |
| [[wiki/agent-sdk/_index\|agent-sdk/]] | Claude Agent SDK (formerly Claude Code SDK) — build autonomous AI agents in Python and TypeScript | 30 |
| [[wiki/llm-models/_index\|llm-models/]] | LLM model catalogs — OpenAI and Claude/Anthropic models, IDs, context, pricing | 2 |
| [[wiki/claude-code/_index\|claude-code/]] | Claude Code product docs — install, capabilities, surfaces, MCP, hooks, scheduling, multi-agent, plugins, skills, channels, error recovery, LM Studio local | 17 |
| [[wiki/claude-code/_index\|claude-code/]] | Claude Code product docs — install, capabilities, surfaces, MCP, hooks, scheduling, multi-agent, plugins, skills, channels, error recovery, LM Studio local | 30 |
| [[wiki/reports/_index\|reports/]] | Weekly and monthly summaries — generate: `uv run python scripts/report-generator.py --weekly` | 1 |
| [[wiki/infrastructure/_index\|infrastructure/]] | Server inventory: all 10 SSH hosts — optical, optical-dev, optical-prod, baic, librechat, modocmms, box-cli, aimpress, pve | 10 |

View file

@ -31,3 +31,16 @@ Claude Code is Anthropic's agentic coding assistant. Works across terminal, IDE,
| [[wiki/claude-code/lmstudio-anthropic-compat\|lmstudio-anthropic-compat]] | Redirect Claude Code and the Anthropic SDK to a local LM Studio server via two env vars; `/v1/messages` drop-in, auth options, cURL + Python examples | raw/Anthropic Compatibility Endpoints.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-chat-completions\|lmstudio-chat-completions]] | LM Studio OpenAI-compatible `/v1/chat/completions`: Python example, all supported params (incl. top_k, repeat_penalty), `lms log stream` debugging | raw/Chat Completions.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-embeddings\|lmstudio-embeddings]] | LM Studio `/v1/embeddings`: OpenAI-compat drop-in, Python example, newline stripping, batch inputs, use with FAISS/Chroma for local RAG | raw/Embeddings.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-idle-ttl-auto-evict\|lmstudio-idle-ttl-auto-evict]] | Idle TTL (per-request `ttl` field, `lms load --ttl`) and Auto-Evict (1 JIT model at a time) for LM Studio memory management | raw/Idle TTL and Auto-Evict.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-rest-api\|lmstudio-rest-api]] | LM Studio native v1 REST API: all endpoints, endpoint feature comparison (native vs OAI vs Anthropic compat), model lifecycle management | raw/LM Studio API.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-messages-api\|lmstudio-messages-api]] | LM Studio `/v1/messages` drop-in: basic, streaming (SSE events), and tool-use cURL examples; auth options | raw/Messages.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-openai-compat-endpoints\|lmstudio-openai-compat-endpoints]] | LM Studio OpenAI-compat overview: 5 endpoints, base_url swap pattern, Python/TS/cURL examples, Codex support | raw/OpenAI Compatibility Endpoints.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-responses-api\|lmstudio-responses-api]] | LM Studio `/v1/responses`: streaming SSE, stateful follow-up via `previous_response_id`, reasoning effort, Remote MCP tools | raw/Responses.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-headless-service\|lmstudio-headless-service]] | Run LM Studio without GUI: llmster daemon (recommended) or desktop tray mode; JIT model loading and auto-evict | raw/Run LM Studio as a service (headless).md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-serve-on-network\|lmstudio-serve-on-network]] | Bind LM Studio server to LAN IP so other devices (thin clients, IoT, team members) can call the API over the local network | raw/Serve on Local Network.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-server-settings\|lmstudio-server-settings]] | All LM Studio API server toggles: port, auth, CORS, LAN access, per-request MCPs, mcp.json access, JIT loading + auto-evict | raw/Server Settings.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-llmster-systemd\|lmstudio-llmster-systemd]] | systemd unit file for llmster: install daemon, load model at boot, ExecStartPre ordering, oneshot+RemainAfterExit pattern, service management commands | raw/Setup llmster as a Startup Task on Linux.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-structured-output\|lmstudio-structured-output]] | Enforce JSON schema on LLM responses via response_format; GGUF uses llama.cpp grammar, MLX uses Outlines; models <7B often unsupported | raw/Structured Output.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-tool-use\|lmstudio-tool-use]] | LM Studio function calling: tool definition format, multi-turn flow, native vs default support, streaming accumulation, Python examples | raw/Tool Use.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-mcp-via-api\|lmstudio-mcp-via-api]] | MCP servers via LM Studio `/api/v1/chat`: ephemeral (inline) vs mcp.json (pre-configured), allowed_tools, custom auth headers | raw/Using MCP via API.md | 2026-04-30 |
| [[wiki/claude-code/lmstudio-lms-cli\|lmstudio-lms-cli]] | `lms` CLI: model download/load/unload/list, server start/stop, log streaming, GPU offload flags, --identifier alias, daemon management | raw/lms — LM Studio's CLI.md | 2026-04-30 |

View file

@ -0,0 +1,104 @@
---
title: "LM Studio Headless / Service Mode"
aliases: [lmstudio-daemon, llmster, lmstudio-background-service]
tags: [lmstudio, local-llm, headless, daemon, jit-loading]
sources: [raw/Run LM Studio as a service (headless).md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio Headless / Service Mode
GUI-less operation of LM Studio: run as a background daemon, start on machine login, and load models on demand via JIT.
## Two Approaches
| Approach | Best For | GUI Required? |
|----------|----------|---------------|
| **llmster** (recommended) | Linux servers, cloud, GPU rigs, headless machines | No |
| **Desktop app headless mode** | Machines with a GUI where app is already installed | Yes (hidden to tray) |
---
## Option 1: llmster (Recommended)
`llmster` is the core of the LM Studio desktop app, repackaged as a server-native daemon. No GUI dependency.
### Install
```bash
# Linux / Mac
curl -fsSL https://lmstudio.ai/install.sh | bash
# Windows (PowerShell)
irm https://lmstudio.ai/install.ps1 | iex
```
### Start the daemon
```bash
lms daemon up
```
- To auto-start on Linux boot, configure it as a **Linux Startup Task** (see LM Studio docs).
- Full CLI reference: `lms daemon --help`
---
## Option 2: Desktop App in Headless Mode
Works on Mac, Windows, Linux (with GUI). Useful if the desktop app is already installed.
### Run server on login
1. Open app settings (`Cmd/Ctrl` + `,`)
2. Enable **"Run LLM server on login"**
3. Exiting the app minimizes to tray — server keeps running
### Start server programmatically
```bash
lms server start
```
Last server state is saved and restored automatically on launch.
---
## Just-In-Time (JIT) Model Loading
Applies to **both** options. Useful when using LM Studio as a backend for other tools (Open WebUI, Claude Code, custom apps).
| JIT State | `/v1/models` returns | Inference behavior |
|-----------|---------------------|--------------------|
| **ON** | All downloaded models | Auto-loads model into VRAM on first call |
| **OFF** | Only models in VRAM | Must manually load model first |
### Auto-Unload
JIT-loaded models are **auto-evicted** after a period of inactivity — see [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] for TTL settings and per-request `ttl` field.
---
## Key Takeaways
- **llmster** is the preferred headless path — works on servers and CI without any GUI
- Desktop headless mode is a quick option for developer machines already running the app
- JIT loading eliminates manual `lms load` calls; models are loaded on first inference request
- JIT-loaded models auto-unload after inactivity (configurable TTL)
- Use `lms server start` to programmatically control the REST server state
- The OpenAI-compatible REST API (`/v1/...`) is available in both modes — see [[wiki/claude-code/lmstudio-openai-compat-endpoints|OpenAI Compat Endpoints]] and [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]]
---
## Related
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — all endpoints and lifecycle management
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management for JIT-loaded models
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|OpenAI Compat Endpoints]] — drop-in base_url swap for any OpenAI client
- [[wiki/claude-code/lmstudio-anthropic-compat|Anthropic Compat Endpoints]] — redirect Claude Code / Anthropic SDK to local LM Studio
## Sources
- `raw/Run LM Studio as a service (headless).md`
- LM Studio docs: https://lmstudio.ai/docs/developer/core/headless

View file

@ -0,0 +1,90 @@
---
title: "LM Studio — Idle TTL and Auto-Evict"
aliases: [lmstudio-ttl, lmstudio-auto-evict, idle-ttl]
tags: [lmstudio, memory-management, jit-loading, ttl, api]
sources: [raw/Idle TTL and Auto-Evict.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio — Idle TTL and Auto-Evict
Memory management features for LM Studio's JIT-loaded models. Prevents idle models from occupying VRAM and enables seamless model switching from external apps.
## Background
| Feature | Default | Purpose |
|---------|---------|---------|
| **JIT Loading** | enabled | Loads model on first API request — no manual preload needed |
| **Idle TTL** | 60 min | Unloads a model after it has been idle for N seconds/minutes |
| **Auto-Evict** | enabled | Unloads previous JIT model before loading a new one |
## Idle TTL
**Problem:** JIT-loaded models stay in VRAM even when idle (e.g. after you stop using Cline, Zed, or Continue.dev).
**Solution:** TTL starts a countdown when the model goes idle. The timer resets on every new request. When it expires, the model unloads automatically.
### Setting TTL
**App-wide default** — configure in Developer tab → Server Settings.
**Per-request (API)** — pass `ttl` in seconds in the request body:
```bash
curl http://localhost:1234/api/v0/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-qwen-7b",
"ttl": 300,
"messages": [...]
}'
```
Works on both the OpenAI-compat (`/v1/`) and LM Studio REST (`/api/v0/`) endpoints.
**`lms` CLI** — set TTL at load time:
```bash
lms load <model> --ttl 3600 # 1 hour
```
Models loaded with `lms load` have **no TTL by default** (persist until manual unload).
**Server tab** — TTL field visible when loading a model through the GUI.
## Auto-Evict
Controls how many JIT-loaded models can coexist in memory.
| State | Behaviour |
|-------|-----------|
| **ON** (default) | At most 1 JIT model in memory at a time; old model evicted before new one loads |
| **OFF** | Models accumulate in memory; only unloaded by TTL expiry or manual action |
- Non-JIT (manually loaded) models are **never** affected by Auto-Evict.
- Toggle in: Developer tab → Server Settings.
## TTL + Auto-Evict Together
- **Auto-Evict** handles immediate switching — keeps 1 active model.
- **TTL** handles the "forgot to switch" case — cleans up if you just stop using an app.
- Both can be active simultaneously for full memory hygiene.
## Key Takeaways
- Set `"ttl": 300` in any API request to cap a model's idle lifetime to 5 minutes.
- `lms load <model> --ttl 3600` is the CLI equivalent for persistent sessions.
- Auto-Evict (default ON) ensures only 1 JIT model lives in VRAM at a time — great for low-VRAM machines.
- `lms load` bypasses TTL defaults; always pass `--ttl` explicitly if you want auto-cleanup.
- These features are irrelevant for models loaded via the GUI Models tab (non-JIT path).
## Related
- [[wiki/claude-code/lmstudio-anthropic-compat|LM Studio Anthropic Compat]] — redirect Claude Code to local LM Studio
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — full parameter reference incl. `top_k`, `repeat_penalty`
- [[wiki/claude-code/lmstudio-embeddings|LM Studio Embeddings]] — local RAG with FAISS/Chroma
## Sources
- [LM Studio Docs — Idle TTL and Auto-Evict](https://lmstudio.ai/docs/developer/core/ttl-and-auto-evict)

View file

@ -0,0 +1,96 @@
---
title: "LM Studio — llmster Startup Service (systemd)"
aliases: [llmster-systemd, lmstudio-startup, lmstudio-daemon-linux]
tags: [lmstudio, llmster, systemd, linux, headless, local-llm]
sources: [raw/Setup llmster as a Startup Task on Linux.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio — llmster Startup Service (systemd)
Configure `llmster` (LM Studio's headless daemon) to launch automatically at boot, load a model, and start the HTTP API server — all via a systemd unit file.
## Install llmster
```bash
curl -fsSL https://lmstudio.ai/install.sh | bash
lms --help # verify
```
## Download a Model
```bash
lms get openai/gpt-oss-20b
# note the model path printed — used in service config
```
## Manual Test (before systemd)
```bash
lms load openai/gpt-oss-20b
lms server start
curl http://localhost:1234/v1/models # should return model list
lms server stop
```
## systemd Unit File
Create `/etc/systemd/system/lmstudio.service` (replace `YOUR_USERNAME`):
```ini
[Unit]
Description=LM Studio Server
[Service]
Type=oneshot
RemainAfterExit=yes
User=YOUR_USERNAME
Environment="HOME=/home/YOUR_USERNAME"
ExecStartPre=/home/YOUR_USERNAME/.lmstudio/bin/lms daemon up
ExecStartPre=/home/YOUR_USERNAME/.lmstudio/bin/lms load openai/gpt-oss-20b --yes
ExecStart=/home/YOUR_USERNAME/.lmstudio/bin/lms server start
ExecStop=/home/YOUR_USERNAME/.lmstudio/bin/lms daemon down
[Install]
WantedBy=multi-user.target
```
- `Type=oneshot` + `RemainAfterExit=yes` — service is considered "active" after `ExecStart` exits
- `ExecStartPre` runs sequentially before `ExecStart`
- Skip the `lms load` line to rely on [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|JIT loading + auto-evict]] instead
## Enable and Start
```bash
sudo systemctl daemon-reload
sudo systemctl enable lmstudio.service
sudo systemctl start lmstudio.service
```
## Verify
```bash
systemctl status lmstudio
curl http://localhost:1234/v1/models
```
## Service Management
```bash
sudo systemctl stop lmstudio # stop
sudo systemctl restart lmstudio # restart
sudo systemctl disable lmstudio # remove from boot
```
## Key Takeaways
- Use `lms daemon up` in `ExecStartPre` — the daemon must be running before `lms load` or `lms server start`
- Binary path is `~/.lmstudio/bin/lms` — use the absolute path in the unit file (systemd has a minimal `$PATH`)
- `Type=oneshot` + `RemainAfterExit=yes` keeps the service "active" so `ExecStop` runs on shutdown
- Omit the `lms load` step and use [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|JIT loading]] to avoid pinning a model at boot
- API is served on `http://localhost:1234` — see [[wiki/claude-code/lmstudio-headless-service|headless service overview]] for non-systemd options and [[wiki/claude-code/lmstudio-serve-on-network|LAN serving]] to expose to other devices
## Sources
- [LM Studio Headless llmster Docs](https://lmstudio.ai/docs/developer/core/headless_llmster)

View file

@ -0,0 +1,108 @@
---
title: "lms — LM Studio CLI"
aliases: [lms-cli, lmstudio-cli]
tags: [lmstudio, cli, local-llm, inference, server]
sources: [raw/lms — LM Studio's CLI.md]
created: 2026-04-30
updated: 2026-04-30
---
# lms — LM Studio CLI
`lms` is LM Studio's built-in CLI utility for managing models, the inference server, and the runtime. Ships with LM Studio — no separate install needed. MIT licensed, open source on GitHub.
## Installation & Verification
```bash
# Already installed with LM Studio — just verify:
lms --help
```
Current version: `v0.0.47`
## Command Reference
| Command | What it does |
|---------|-------------|
| `lms chat` | Start interactive chat with a model in the terminal |
| `lms get` | Search and download models |
| `lms ls` | List models available on disk |
| `lms ps` | List models currently loaded in memory |
| `lms load` | Load a model (with GPU/context options) |
| `lms unload` | Unload a model |
| `lms import` | Import a model file into LM Studio |
| `lms server start/stop` | Control the local API server |
| `lms log` | Stream incoming/outgoing messages for debugging |
| `lms runtime` | Manage and update the inference runtime |
| `lms daemon` | Manage the headless llmster daemon |
| `lms link` | Manage LM Link |
| `lms clone` | Clone an artifact from LM Studio Hub |
| `lms push` | Upload artifact to LM Studio Hub |
| `lms login` | Authenticate with LM Studio |
## Common Workflows
### Server control
```bash
lms server start
lms server stop
```
### List & inspect models
```bash
lms ls # models on disk (reflects My Models directory)
lms ps # models currently loaded in memory
```
### Load a model
```bash
# With GPU offload and context size:
lms load [--gpu=max|auto|0.0-1.0] [--context-length=1-N]
# --gpu=1.0 → 100% GPU offload
# With a stable identifier alias:
lms load openai/gpt-oss-20b --identifier="my-model-name"
```
Using `--identifier` keeps the model ID stable across loads — useful when client code hardcodes a model name.
### Unload a model
```bash
lms unload # unload specific model
lms unload --all # unload everything
```
### Debug message flow
```bash
lms log stream # tail all incoming/outgoing API messages live
```
Pairs with [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] for debugging request/response cycles.
## Key Takeaways
- `lms` ships with LM Studio — zero extra install steps
- `lms ps` vs `lms ls`: loaded-in-memory vs on-disk — two different commands
- `--gpu=1.0` forces full GPU offload; `--gpu=auto` lets LM Studio decide
- `--identifier` flag on `lms load` decouples client model names from actual model paths
- `lms log stream` is the fastest way to debug what's hitting the server
- `lms daemon` manages [[wiki/claude-code/lmstudio-headless-service|llmster]] for headless/service deployments
- MIT licensed: safe to embed in scripts and automation
## Related Articles
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — all API endpoints
- [[wiki/claude-code/lmstudio-headless-service|Headless Service (llmster)]] — daemon mode for servers
- [[wiki/claude-code/lmstudio-server-settings|Server Settings]] — port, auth, CORS, JIT loading
- [[wiki/claude-code/lmstudio-chat-completions|Chat Completions]] — OpenAI-compat `/v1/chat/completions`
- [[wiki/claude-code/lmstudio-llmster-systemd|llmster systemd unit]] — run llmster at boot on Linux
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management
## Sources
- lmstudio.ai/docs/cli

View file

@ -0,0 +1,115 @@
---
title: "LM Studio — MCP via API"
aliases: [lmstudio-mcp-api, mcp-lmstudio, lm-studio-mcp]
tags: [lmstudio, mcp, api, tool-use, integration]
sources: [raw/Using MCP via API.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio — MCP via API
Requires LM Studio 0.4.0+. MCP servers provide tools that models can call during chat requests via `/api/v1/chat`.
## Two Server Modes
| Feature | Ephemeral | mcp.json |
|---------|-----------|----------|
| Specified via | `integrations``"type": "ephemeral_mcp"` | `integrations``"type": "plugin"` |
| Config | Per-request only | Pre-configured in `mcp.json` |
| Use case | One-off / remote tools | Frequent use, tools needing `command` (local processes) |
| Server ID | `server_label` in integration | `id` (e.g. `mcp/playwright`) |
| Custom headers | `headers` field | Configured in `mcp.json` |
## Ephemeral MCP Servers
Defined inline per-request — no pre-configuration needed.
```bash
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
],
"context_length": 8000
}'
```
Response output contains typed entries: `reasoning`, `message`, and `tool_call` objects. Each `tool_call` includes the tool name, arguments, output, and `provider_info` identifying the server.
## mcp.json Pre-configured Servers
Recommended for servers that run local commands (e.g. `microsoft/playwright-mcp`) or are used frequently.
```bash
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "Open lmstudio.ai",
"integrations": ["mcp/playwright"],
"context_length": 8000,
"temperature": 0
}'
```
- `integrations` can be a plain string array when referencing pre-configured servers
- `provider_info.type` will be `"plugin"` (vs `"ephemeral_mcp"` for inline)
## Restricting Tool Access
Use `allowed_tools` on either integration type:
```json
"allowed_tools": ["model_search"]
```
- Limits which tools the model can call from that server
- Speeds up prompt processing — fewer tool definitions in context
- If omitted, all server tools are available
## Custom Headers (Ephemeral)
For authenticated remote MCP endpoints:
```json
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"],
"headers": {
"Authorization": "Bearer <YOUR_HF_TOKEN>"
}
}
```
## Key Takeaways
- LM Studio exposes MCP tool calling through its native `/api/v1/chat` endpoint (not the OpenAI-compat route)
- Two modes: **ephemeral** (inline, per-request) vs **mcp.json** (pre-configured, recommended for local/frequent servers)
- `allowed_tools` works on both modes — use it to reduce context size and restrict scope
- Tool call results appear inline in the `output` array alongside `reasoning` and `message` entries
- Auth headers for remote MCP servers go in the `headers` field on ephemeral integrations
- The [[wiki/claude-code/lmstudio-responses-api|Responses API]] also supports Remote MCP via `tools` — different endpoint, same concept
## Related
- [[wiki/claude-code/lmstudio-responses-api|LM Studio Responses API]] — `/v1/responses` endpoint also supports Remote MCP tools
- [[wiki/claude-code/lmstudio-tool-use|LM Studio Tool Use]] — function calling (non-MCP) patterns
- [[wiki/claude-code/lmstudio-server-settings|LM Studio Server Settings]] — toggle per-request MCPs and mcp.json access in the UI
- [[wiki/claude-code/mcp-integration|Claude Code MCP Integration]] — MCP concepts: transports, scopes, OAuth
## Sources
- `raw/Using MCP via API.md` — LM Studio docs, 2026-04-30

View file

@ -0,0 +1,120 @@
---
title: "LM Studio — Anthropic Messages API"
aliases: [lmstudio-messages, lm-studio-anthropic-messages]
tags: [lmstudio, anthropic, api, messages, local-llm, streaming, tools]
sources: [raw/Messages.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio — Anthropic Messages API
The `/v1/messages` endpoint in LM Studio mirrors the Anthropic Messages API exactly — same request shape, same response shape. Use it as a local drop-in for any code already calling Anthropic's cloud API.
## Endpoint
```
POST http://localhost:1234/v1/messages
```
Required headers:
- `Content-Type: application/json`
- `x-api-key: $LM_API_TOKEN` — optional if **Require Authentication** is disabled in LM Studio
## Basic Request
```bash
curl http://localhost:1234/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $LM_API_TOKEN" \
-d '{
"model": "ibm/granite-4-micro",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Say hello from LM Studio."}
]
}'
```
## Streaming
Add `"stream": true` to receive Server-Sent Events (SSE):
```bash
curl http://localhost:1234/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $LM_API_TOKEN" \
-d '{
"model": "ibm/granite-4-micro",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 256,
"stream": true
}'
```
SSE event sequence:
1. `message_start`
2. `content_block_start`
3. `content_block_delta` (repeating)
4. `content_block_stop`
5. `message_delta`
6. `message_stop`
## Tool Use
Pass a `tools` array with JSON Schema input definitions and a `tool_choice` policy:
```bash
curl http://localhost:1234/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $LM_API_TOKEN" \
-d '{
"model": "ibm/granite-4-micro",
"max_tokens": 1024,
"tools": [
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
],
"tool_choice": {"type": "any"},
"messages": [
{"role": "user", "content": "What is the weather like in San Francisco?"}
]
}'
```
`tool_choice` options (Anthropic-compat): `"auto"`, `"any"`, `{"type": "tool", "name": "…"}`.
## Authentication
| Scenario | Header needed |
|----------|---------------|
| Auth disabled in LM Studio | No `x-api-key` required |
| Auth enabled | `x-api-key: $LM_API_TOKEN` |
## Key Takeaways
- `POST /v1/messages` on `localhost:1234` is a drop-in for `api.anthropic.com/v1/messages`
- Same request body — swap the base URL and optionally add `x-api-key`
- Streaming uses standard Anthropic SSE event names — existing stream parsers work unchanged
- Tool use with `input_schema` / `tool_choice` is supported
- Auth header is optional when LM Studio's **Require Authentication** is off
- See [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]] for redirecting the full Anthropic SDK via env vars
## Related
- [[wiki/claude-code/lmstudio-anthropic-compat|LM Studio Anthropic Compat Setup]] — redirect Claude Code / SDK to local server
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — OpenAI-compatible `/v1/chat/completions`
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native v1 endpoints and feature comparison table
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management for loaded models

View file

@ -0,0 +1,86 @@
---
title: "LM Studio — OpenAI Compatibility Endpoints"
aliases: [lmstudio-openai-compat, lmstudio-oai-endpoints]
tags: [lmstudio, openai, local-llm, api, embeddings, chat-completions]
sources: [raw/OpenAI Compatibility Endpoints.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio — OpenAI Compatibility Endpoints
LM Studio exposes an OpenAI-compatible HTTP server. Any existing OpenAI client (Python, TypeScript, cURL, C#, etc.) works against it by changing only the **base URL**.
Default port: `1234`.
## Supported Endpoints
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/v1/models` | GET | List loaded/available models |
| `/v1/responses` | POST | Responses API (Codex-compatible) |
| `/v1/chat/completions` | POST | Chat with text and images |
| `/v1/embeddings` | POST | Generate text embeddings |
| `/v1/completions` | POST | Legacy completions |
## Switching Base URL
Only one line changes — the `base_url` / `baseUrl` property.
### Python
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:1234/v1"
)
# rest of your code unchanged
```
### TypeScript
```typescript
import OpenAI from 'openai';
const client = new OpenAI({
baseUrl: "http://localhost:1234/v1"
});
```
### cURL
```bash
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "<model-identifier-from-lmstudio>",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
```
## Codex Support
LM Studio supports OpenAI Codex via the `POST /v1/responses` endpoint — the same one Codex targets.
## Key Takeaways
- **Drop-in replacement** — swap `base_url` to `http://localhost:1234/v1`; no other code changes needed
- **Five endpoints** — models, responses, chat/completions, embeddings, legacy completions
- **No API key required** by default (LM Studio runs locally)
- **Codex works** because LM Studio implements `/v1/responses`
- **Model IDs differ** — use the model identifier shown in LM Studio, not OpenAI slugs like `gpt-4o`
- For richer stats (token/s, TTFT, model lifecycle) use the [[wiki/claude-code/lmstudio-rest-api|native LM Studio REST API]] instead
## Related Articles
- [[wiki/claude-code/lmstudio-anthropic-compat|Anthropic Compat Endpoints]] — `/v1/messages` drop-in for Claude SDK
- [[wiki/claude-code/lmstudio-chat-completions|Chat Completions]] — full param reference for `/v1/chat/completions`
- [[wiki/claude-code/lmstudio-embeddings|Embeddings]] — `/v1/embeddings` for local RAG pipelines
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native v1 API with extended model metadata
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|Idle TTL & Auto-Evict]] — memory management for loaded models
## Sources
- [LM Studio OpenAI Compat Docs](https://lmstudio.ai/docs/developer/openai-compat) — raw/OpenAI Compatibility Endpoints.md

View file

@ -0,0 +1,124 @@
---
title: "LM Studio Responses API"
aliases: [lmstudio-responses, lm-studio-openai-responses]
tags: [lm-studio, openai-compat, responses-api, streaming, mcp, reasoning]
sources: [raw/Responses.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio Responses API
LM Studio exposes `/v1/responses` — an OpenAI Responses API-compatible endpoint with support for streaming, reasoning effort, stateful multi-turn via `previous_response_id`, and Remote MCP tools.
Base URL: `http://localhost:1234/v1/responses`
---
## Basic Request (non-streaming)
```bash
curl http://localhost:1234/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b",
"input": "Provide a prime number less than 50",
"reasoning": { "effort": "low" }
}'
```
- `input` — plain string prompt (no messages array required)
- `reasoning.effort``"low"` | `"medium"` | `"high"` (model-dependent)
---
## Stateful Follow-up
Carry conversation state across calls using `previous_response_id`:
```bash
curl http://localhost:1234/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b",
"input": "Multiply it by 2",
"previous_response_id": "resp_123"
}'
```
- The `id` field from any prior response becomes the `previous_response_id` of the next
- No need to replay the full message history client-side
---
## Streaming
```bash
curl http://localhost:1234/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b",
"input": "Hello",
"stream": true
}'
```
SSE events emitted:
| Event | Description |
|-------|-------------|
| `response.created` | Response object initialised |
| `response.output_text.delta` | Incremental text chunk |
| `response.completed` | Final event, full response included |
---
## Remote MCP Tools (opt-in)
Enable in LM Studio: **Developer → Settings → Remote MCP**.
```bash
curl http://localhost:1234/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"tools": [
{
"type": "mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
]
}'
```
- `server_label` — arbitrary identifier for this MCP server
- `server_url` — remote MCP server URL
- `allowed_tools` — allowlist of tool names the model may call
---
## Key Takeaways
- `/v1/responses` is an OpenAI Responses API drop-in; swap base URL only
- `previous_response_id` enables multi-turn without replaying history — simpler than maintaining a messages array
- Streaming uses standard SSE; listen for `response.output_text.delta` for incremental chunks
- Remote MCP tools are per-request and opt-in — must enable the feature in LM Studio settings first
- `reasoning.effort` controls thinking depth; not all models support it
---
## Related
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — overview of all 5 OAI-compatible endpoints
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — `/v1/chat/completions` with full param reference
- [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — `/v1/messages` Anthropic-compat with streaming + tool-use
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native endpoint feature comparison table
- [[wiki/claude-code/mcp-integration|MCP Integration]] — Claude Code MCP setup and server patterns
---
## Sources
- `raw/Responses.md` — LM Studio developer docs: `/v1/responses` endpoint

View file

@ -0,0 +1,75 @@
---
title: "LM Studio REST API (v1)"
aliases: [lmstudio-api, lm-studio-rest, lmstudio-v1]
tags: [lmstudio, rest-api, local-inference, openai-compat, anthropic-compat, mcp]
sources: [raw/LM Studio API.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio REST API (v1)
LM Studio 0.4.0 introduced the native **v1 REST API** at `/api/v1/*`. It sits alongside OpenAI-compatible and Anthropic-compatible endpoints and offers the richest feature set for local inference.
## v1 vs v0
The old v0 API (`/api/v0/*`) is superseded. Migrate to `/api/v1/*` for:
- **Stateful chats** — server keeps conversation context across turns
- **MCP via API** — use MCPs configured in LM Studio directly from requests
- **Authentication** — API token support
- **Model management** — download, load, unload via API
## Supported Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
| `/api/v1/chat` | POST | Inference (native) |
| `/api/v1/models` | GET | List loaded models |
| `/api/v1/models/load` | POST | Load a model into VRAM |
| `/api/v1/models/unload` | POST | Unload a model |
| `/api/v1/models/download` | POST | Download a model |
| `/api/v1/models/download/status` | GET | Poll download progress |
## Inference Endpoint Comparison
Four endpoints can run inference. Pick based on which features you need:
| Feature | `/api/v1/chat` | `/v1/responses` (OAI) | `/v1/chat/completions` (OAI) | `/v1/messages` (Anthropic) |
|---|:---:|:---:|:---:|:---:|
| Streaming | ✅ | ✅ | ✅ | ✅ |
| Stateful chat | ✅ | ✅ | ❌ | ❌ |
| Remote MCPs | ✅ | ✅ | ❌ | ❌ |
| LM Studio MCPs | ✅ | ✅ | ❌ | ❌ |
| Custom tools | ❌ | ✅ | ✅ | ✅ |
| Assistant messages in request | ❌ | ✅ | ✅ | ✅ |
| Model load streaming events | ✅ | ❌ | ❌ | ❌ |
| Prompt processing events | ✅ | ❌ | ❌ | ❌ |
| Specify context length | ✅ | ❌ | ❌ | ❌ |
**Decision guide:**
- Need MCP tools + stateful chat → `/api/v1/chat` or `/v1/responses`
- Need custom tool definitions → `/v1/responses`, `/v1/chat/completions`, or `/v1/messages`
- Dropping in existing OpenAI SDK code → `/v1/chat/completions`
- Dropping in existing Anthropic SDK code → `/v1/messages`
## Key Takeaways
- The **native `/api/v1/chat`** endpoint has exclusive features: stateful chat, LM Studio MCPs, model-load events, prompt-processing events, and per-request context length.
- **`/v1/responses`** (OpenAI Responses API compat) is the best of both worlds — stateful + MCP + custom tools.
- **`/v1/chat/completions`** is the broadest drop-in for existing OpenAI code but loses statefulness and MCP.
- **`/v1/messages`** lets you redirect the Anthropic SDK to a local model with minimal code change (see [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]]).
- Model management endpoints let you fully automate the model lifecycle — download → load → infer → unload — without touching the GUI.
- API token auth is available for securing the local server (useful when exposed on a LAN).
## Related Articles
- [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]] — redirect Claude Code / Anthropic SDK to LM Studio via env vars
- [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] — OpenAI `/v1/chat/completions` usage, params, debugging
- [[wiki/claude-code/lmstudio-embeddings|lmstudio-embeddings]] — `/v1/embeddings` for local RAG pipelines
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|lmstudio-idle-ttl-auto-evict]] — memory management: TTL and auto-evict
- [[wiki/agent-sdk/overview|agent-sdk/overview]] — build multi-agent systems that call local models
## Sources
- `raw/LM Studio API.md` — clipped from lmstudio.ai/docs/developer/rest

View file

@ -0,0 +1,54 @@
---
title: "LM Studio — Serve on Local Network"
aliases: [lmstudio-network, lmstudio-lan-server]
tags: [lmstudio, networking, api-server, local-llm, lan]
sources: [raw/Serve on Local Network.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio — Serve on Local Network
Enabling **Serve on Local Network** makes the LM Studio API server accessible to other devices on the same LAN — not just `localhost`.
## How It Works
- By default the server binds to `127.0.0.1` (localhost only)
- With the option enabled it binds to your machine's **local network IP** (e.g. `192.168.x.x`)
- The API access URL shown in LM Studio updates to reflect the new binding
- All existing API endpoints stay the same — only the host changes
## Use Cases
| Scenario | Why useful |
|----------|-----------|
| Thin-client devices (laptop, tablet, phone) | Offload inference to a powerful desktop on the same network |
| Shared team access | Multiple people hit one LM Studio instance |
| IoT / edge devices | Raspberry Pi or similar calls the API over LAN |
| Local service mesh | Other self-hosted services (Home Assistant, scripts) consume the LLM |
## Setup Steps
1. Open LM Studio → **Local Server** tab
2. Toggle **Serve on Local Network** → ON
3. Note the updated **API access URL** displayed (e.g. `http://192.168.1.x:1234`)
4. On client devices, point `base_url` to that address instead of `http://localhost:1234`
## Key Takeaways
- One toggle — no firewall rule changes required on most home routers (LAN-to-LAN is open by default)
- The API surface is identical to localhost; only the bind address differs
- Useful when pairing a powerful homelab machine with weaker clients — see [[wiki/homelab/_index|homelab]] for server options
- Combine with [[wiki/claude-code/lmstudio-headless-service|lmstudio-headless-service]] to run the server without the GUI on a headless machine
- For redirecting Claude Code itself to the local server, see [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]]
## Related
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — full endpoint reference
- [[wiki/claude-code/lmstudio-headless-service|LM Studio Headless Service]] — run without GUI (daemon mode)
- [[wiki/claude-code/lmstudio-anthropic-compat|Anthropic Compat Endpoints]] — point Claude Code at local server
- [[wiki/homelab/_index|Homelab]] — self-hosted hardware for running LM Studio
## Sources
- `raw/Serve on Local Network.md` — clipped from lmstudio.ai/docs/developer/core/server/serve-on-network

View file

@ -0,0 +1,62 @@
---
title: "LM Studio Server Settings"
aliases: [lmstudio-server-config, lm-studio-api-server-settings]
tags: [lmstudio, api-server, configuration, mcp, jit, cors, auth]
sources: [raw/Server Settings.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio Server Settings
Configuration options for the LM Studio API server — accessible from the LM Studio UI or `lms` CLI. Controls port, auth, network access, MCP permissions, CORS, and JIT model memory management.
## Network & Access
| Setting | Type | Description |
|---------|------|-------------|
| **Server Port** | Integer | Port the API server listens on (default `1234`) |
| **Serve on Local Network** | Switch | Binds server to LAN IP so other devices can reach it — see [[wiki/claude-code/lmstudio-serve-on-network\|Serve on Network]] |
| **Enable CORS** | Switch | Allow cross-origin requests (needed for browser-based clients hitting a local server) |
## Authentication
| Setting | Type | Description |
|---------|------|-------------|
| **Require Authentication** | Switch | Clients must pass a valid token in `Authorization` header — see [[wiki/claude-code/lmstudio-anthropic-compat\|LM Studio Auth docs]] |
> Authentication is a prerequisite for enabling MCP server access from `mcp.json`.
## MCP (Model Context Protocol)
| Setting | Type | Description |
|---------|------|-------------|
| **Allow per-request MCPs** | Switch | Clients may specify ephemeral remote MCP servers in individual requests (not in `mcp.json`). Only remote MCPs supported. |
| **Allow calling servers from mcp.json** | Switch | Clients may use MCP servers defined in your LM Studio `mcp.json`. **Requires Auth enabled.** Security risk if those servers have filesystem/data access. |
Related: [[wiki/claude-code/mcp-integration\|MCP Integration]]
## JIT (Just-in-Time) Model Loading
Saves RAM by loading models on demand rather than pre-loading them.
| Setting | Type | Description |
|---------|------|-------------|
| **Just in Time Model Loading** | Switch | Load a model at request time if not already loaded |
| **Auto Unload Unused JIT Models** | Switch | Automatically evict JIT models when idle |
| **Only Keep Last JIT Loaded Model** | Switch | Evict all but the most recently used JIT model — minimizes RAM usage |
> For deeper JIT / TTL / eviction behavior, see [[wiki/claude-code/lmstudio-idle-ttl-auto-evict\|Idle TTL and Auto-Evict]].
## Key Takeaways
- **Port** is the only integer setting; all others are on/off switches.
- **Auth is a gate**`mcp.json` server access won't work without it enabled.
- **Per-request MCPs** are ephemeral and remote-only; they don't persist after the request.
- **CORS** must be on for any browser app (web UI, local HTML tool) to call the API.
- **JIT trio** (`JIT Load``Auto Unload``Only Keep Last`) progressively tightens memory: enable all three on low-RAM machines.
- LAN access via [[wiki/claude-code/lmstudio-serve-on-network\|Serve on Network]] is a separate setting from CORS — you may need both.
## Sources
- `raw/Server Settings.md` — scraped from [lmstudio.ai/docs/developer/core/server/settings](https://lmstudio.ai/docs/developer/core/server/settings)

View file

@ -0,0 +1,150 @@
---
title: "LM Studio Structured Output"
aliases: [lmstudio-json-schema, structured-output-lmstudio]
tags: [lmstudio, structured-output, json-schema, openai-compat, local-llm]
sources: [raw/Structured Output.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio Structured Output
Enforce a specific JSON shape on LLM responses by passing a JSON schema to `/v1/chat/completions`. Compatible with OpenAI's Structured Output API format.
## How It Works
- Add a `response_format` field to the chat completions request
- Provide a `json_schema` with a `name`, optional `strict`, and a `schema` object
- The model is constrained to return valid JSON matching that schema
- Response arrives as a string in `choices[0].message.content` — parse it with `json.loads()`
## Server Setup
```bash
lms server start
# or enable from Developer tab in LM Studio UI
```
Install the CLI first if needed:
```bash
npx lmstudio install-cli
```
## request_format Shape
```json
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "my_schema",
"strict": "true",
"schema": {
"type": "object",
"properties": {
"field": { "type": "string" }
},
"required": ["field"]
}
}
}
```
## cURL Example
```bash
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "{{model}}",
"messages": [
{"role": "system", "content": "You are a helpful jokester."},
{"role": "user", "content": "Tell me a joke."}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "joke_response",
"strict": "true",
"schema": {
"type": "object",
"properties": { "joke": {"type": "string"} },
"required": ["joke"]
}
}
},
"temperature": 0.7,
"max_tokens": 50,
"stream": false
}'
```
## Python Example
```python
from openai import OpenAI
import json
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
character_schema = {
"type": "json_schema",
"json_schema": {
"name": "characters",
"schema": {
"type": "object",
"properties": {
"characters": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"occupation": {"type": "string"},
"personality": {"type": "string"},
"background": {"type": "string"}
},
"required": ["name", "occupation", "personality", "background"]
},
"minItems": 1
}
},
"required": ["characters"]
}
}
}
response = client.chat.completions.create(
model="your-model",
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Create 1-3 fictional characters"}
],
response_format=character_schema,
)
results = json.loads(response.choices[0].message.content)
print(json.dumps(results, indent=2))
```
## Structured Output Engines
| Model Format | Engine |
|---|---|
| GGUF | `llama.cpp` grammar-based sampling |
| MLX | [Outlines](https://github.com/dottxt-ai/outlines) via [lmstudio-ai/mlx-engine](https://github.com/lmstudio-ai/mlx-engine) |
## Key Takeaways
- Use `response_format.type = "json_schema"` — same shape as OpenAI's Structured Outputs API
- Works with any OpenAI-compatible client SDK (Python, TS, etc.) just by pointing `base_url` at localhost
- Response is always a **string** in `choices[0].message.content` — always call `json.loads()` on it
- Not all models support this: **models below 7B parameters often cannot do structured output** — check the model card
- GGUF uses grammar sampling; MLX uses Outlines — both constrain tokens at generation time, not post-hoc
- All standard `/v1/chat/completions` params (temperature, max_tokens, stream, etc.) still apply
## Related
- [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] — full parameter reference for the completions endpoint
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|lmstudio-openai-compat-endpoints]] — overview of all OpenAI-compat endpoints
- [[wiki/claude-code/lmstudio-responses-api|lmstudio-responses-api]] — stateful responses with streaming and Remote MCP tools
- [[wiki/claude-code/lmstudio-rest-api|lmstudio-rest-api]] — native LM Studio API and endpoint feature comparison

View file

@ -0,0 +1,158 @@
---
title: "LM Studio Tool Use (Function Calling)"
aliases: [lmstudio-function-calling, lmstudio-tools]
tags: [lmstudio, tool-use, function-calling, openai-compat, python, local-llm]
sources: [raw/Tool Use.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio Tool Use (Function Calling)
Tool use lets LLMs *request* calls to external functions/APIs via LM Studio's OpenAI-compatible `/v1/chat/completions` and `/v1/responses` endpoints. Your code executes the actual functions and feeds results back.
## Key Takeaways
- LLMs **cannot execute code** — they output structured text requesting a tool call; your code runs it
- Uses the same format as OpenAI's Function Calling API — any OpenAI SDK works
- Tool definitions are injected into the system prompt via the model's chat template
- Two support tiers: **Native** (model trained for tool use) and **Default** (fallback prompt injection)
- After tool execution, re-prompt the model *without* tools to get a plain-text final answer
- Streaming tool calls arrive in chunks — accumulate `delta.tool_calls` before executing
## High-Level Flow
```
Setup LLM + tool list
→ Get user input
→ LLM prompted with messages
→ Needs tools?
Yes → Tool Response → Execute tools → Add results to messages → re-prompt
No → Normal response → loop back
```
## Tool Definition Format
```json
{
"type": "function",
"function": {
"name": "get_delivery_date",
"description": "Get the delivery date for a customer's order",
"parameters": {
"type": "object",
"properties": {
"order_id": { "type": "string" }
},
"required": ["order_id"]
}
}
}
```
Pass as the `tools` array in the request body — identical to OpenAI's spec.
## Response Parsing
- Tool call detected: `choices[0].message.tool_calls` array is populated; `finish_reason = "tool_calls"`
- No tool call: response lands in `choices[0].message.content` as normal text
- If the model outputs a malformed tool call, LM Studio falls back to `content` — use `lms log stream` to debug
## Multi-Turn Pattern (Python)
```python
from openai import OpenAI
import json
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
# 1. First call — with tools
response = client.chat.completions.create(
model="lmstudio-community/qwen2.5-7b-instruct",
messages=messages,
tools=tools,
)
# 2. Execute the requested tool
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
result = my_function(**args)
# 3. Append both the assistant's tool-call message and the tool result
messages += [
{"role": "assistant", "tool_calls": [tool_call]},
{"role": "tool", "content": json.dumps(result), "tool_call_id": tool_call.id},
]
# 4. Second call — WITHOUT tools for final plain-text answer
final = client.chat.completions.create(model=model, messages=messages)
print(final.choices[0].message.content)
```
## Native vs Default Support
| Level | What it means | Quality |
|-------|---------------|---------|
| **Native** | Model has a tool-use chat template + LM Studio parses its format | Best |
| **Default** | LM Studio injects a custom system prompt + converts `tool` role to `user` | Variable |
### Models with Native Support (as of 2024-11)
- **Qwen** — Qwen2.5-7B-Instruct (GGUF / MLX)
- **Llama** — Llama-3.1 / 3.2 8B-Instruct (GGUF / MLX)
- **Mistral** — Ministral-8B-Instruct-2410 (GGUF / MLX)
Native models show a hammer badge in the LM Studio UI.
## Streaming Tool Calls
```python
# Accumulate chunks — name and arguments arrive in pieces
for chunk in stream:
delta = chunk.choices[0].delta
if delta.tool_calls:
for tc in delta.tool_calls:
# Append tc.id, tc.function.name, tc.function.arguments fragments
```
Execute only after the stream ends and `tool_calls` is fully assembled.
## Quick Start
```bash
# Start server
lms server start
# Load a model
lms load
# Debug raw prompts (see how tools are injected)
lms log stream
```
```bash
# curl single-turn example
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "lmstudio-community/qwen2.5-7b-instruct",
"messages": [{"role": "user", "content": "Search dell products under $50"}],
"tools": [...]}'
```
## Troubleshooting
- **No `tool_calls` in response** — model output was malformed; run `lms log stream` to inspect the raw prompt and output
- **Smaller models** — may not follow the tool call format reliably; prefer ≥7B models with native support
- **Default mode weirdness** — check the injected system prompt via `lms log stream`; the format uses `[TOOL_REQUEST]...[END_TOOL_REQUEST]` tags
## Related
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — full `/v1/chat/completions` param reference
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — all 5 compatible endpoints
- [[wiki/claude-code/lmstudio-responses-api|LM Studio Responses API]] — `/v1/responses` with Remote MCP tools
- [[wiki/claude-code/lmstudio-structured-output|LM Studio Structured Output]] — enforce JSON schema on responses
- [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — Anthropic-compat tool use examples
## Sources
- `raw/Tool Use.md` — LM Studio official docs (lmstudio.ai/docs/developer/openai-compat/tools), published 2024-11-19

View file

@ -43,3 +43,4 @@ Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, bud
| [[wiki/homelab/glance-dashboard\|Glance — Self-hosted Dashboard]] | Glance setup replacing Homarr: Docker config, 5-page layout, Prometheus RAPL metrics, key patterns ($include caveat, internal IPs only) | session 2026-04-29 | 2026-04-29 |
| [[wiki/homelab/homelab-media-stack\|Homelab Media Stack — Jellyfin + *arr + qBittorrent Setup]] | CT111 media LXC: unified /data mount pattern, Intel QuickSync GPU passthrough, step-by-step qBittorrent categories + Sonarr/Radarr/Prowlarr wiring | session 2026-04-26 | 2026-04-26 |
| [[wiki/homelab/hp-elitedesk-800g3-proxmox\|HP Elitedesk 800 G3 — Proxmox Setup Log]] | Real homelab server setup log: i5-7500, 24 GB RAM, 256 GB NVMe + 6 TB HDD, LXC containers, GPU passthrough (AMD/Intel) | session 2026-04-18 | 2026-04-21 |
| [[wiki/homelab/hp-elitedesk-800g3-teardown-upgrade\|HP EliteDesk 800 G3 SFF — Teardown, Upgrade & Benchmarks]] | Full disassembly/reassembly guide: proprietary connectors caveat, dual-channel RAM, CPU cooler swap, GTX 1050 Ti, thermal benchmarks (GTA V, Flight Sim) | raw/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md | 2026-04-30 |

View file

@ -0,0 +1,153 @@
---
title: "HP EliteDesk 800 G3 SFF — Teardown, Upgrade & Benchmarks"
aliases: [elitedesk-800-g3-teardown, hp-sff-upgrade-guide]
tags: [homelab, hardware, hp, sff, upgrade, benchmark]
sources: [raw/HP EliteDesk 800 G3 SFF - Teardown, re-assembly and upgrade.md]
created: 2026-04-30
updated: 2026-04-30
---
## Overview
The HP EliteDesk 800 G3 SFF is a small form factor desktop often available cheaply at auctions. It uses a **proprietary motherboard and PSU connector** — not standard ATX — which limits some upgrade paths but still allows CPU, RAM, SSD, and GPU swaps.
Reference config (video unit): i7-7700 · GTX 1050 Ti (low-profile) · 16 GB DDR4 · 256 GB NVMe SSD
---
## Exterior Ports
**Front**
- 1× USB-C
- 2× USB 3.0
- 2× USB 2.0
- Audio in/out
- Power button
- Slim optical drive bay
- Optional SD-card reader slot
**Back**
- DisplayPort
- Flexible port option (VGA/DP/HDMI via option card)
- RJ45 (Gigabit)
- 2× USB 2.0 + 2× USB 3.0
- Power connector
- GPU video outputs (from installed card)
---
## Motherboard Layout
Non-standard form factor — not ATX/ITX. Key connectors:
| Component | Detail |
|-----------|--------|
| PCIe slots | 1× x16 (GPU), 2× x1, 1× x4 (downshifted) |
| RAM slots | 4× DDR4 DIMM — DIMM1/2 = Ch. B, DIMM3/4 = Ch. A |
| Storage | 1× M.2 NVMe SSD, 3× SATA, 1× M.2 Wi-Fi |
| Power | Proprietary non-standard PSU connector |
| Option card | VGA / DisplayPort / HDMI output slot |
| CMOS reset | Physical button on board |
**Proprietary connectors = motherboard and PSU cannot be swapped for generic parts.**
---
## Disassembly Procedure
1. **Open case** — slide latch on top cover, no tools needed
2. **Open airflow panel** — provides better access to NVMe, SATA, and RAM
3. **Remove CPU cooler cover** (plastic airflow shroud)
4. Disconnect and slide out **slim DVD drive** (green latch release)
5. Remove **front panel**
6. Remove **GPU** (low-profile PCIe card, 4 GB VRAM)
7. Disconnect **proprietary power connectors**
8. Remove **NVMe SSD** (single retention screw)
9. Remove **RAM sticks**
10. Unscrew 4 screws → lift **CPU cooler**
11. Lift lever → remove **CPU** (LGA 1151)
12. Remove **motherboard** from chassis
---
## Upgrade Notes
### RAM — Dual Channel
- Use matching DIMMs in **same-colour slots** (one per channel)
- For 16 GB: 2× 8 GB — one in Ch. A slot, one in Ch. B slot
### CPU Cooler Replacement
- Stock cooler can develop bearing noise
- Replacement must be **PWM 4-pin** type
- Heatsink mounts to chassis (not board) — install after board is seated in case
- Clean old paste with isopropyl alcohol before applying new thermal paste
### 3.5" HDD Addition
- Install standoff screws on drive
- Slide into drive cage
- Connect SATA data + power cables
### GPU (Low-Profile Required)
- SFF case requires **low-profile PCIe card**
- Tested: Gigabyte GTX 1050 Ti (4 GB VRAM) — fits the x16 slot
---
## Reassembly Order
1. CPU into socket (match orientation notch)
2. NVMe SSD → slot + screw
3. RAM → correct channel slots
4. Motherboard into case
5. CPU cooler + thermal paste → fix to chassis
6. Connect CPU fan to board
7. Airflow cover (clips onto CPU fan)
8. Power cables + speaker
9. DVD drive + SATA cable
10. 3.5" HDD → cage + cables
11. GPU → PCIe slot
12. SATA data cable for HDD
13. Front cover → top cover
---
## Benchmark Results (i7-7700 + GTX 1050 Ti)
| Test | Result |
|------|--------|
| Geekbench CPU | Expected for i7-7700 generation |
| Geekbench Compute (GPU) | Expected for GTX 1050 Ti |
| Microsoft Flight Simulator (Medium, 1080p) | ~30 FPS steady |
| GTA V (Very High + AA, 1080p) | Consistent 60+ FPS |
### Thermal Observations
- CPU and GPU approach **~90°C** under sustained load (Flight Simulator)
- GTA V similarly runs hot
- SFF chassis limits airflow — **monitor temps if running sustained workloads**
---
## Key Takeaways
- The EliteDesk 800 G3 SFF uses **proprietary PSU and motherboard connectors** — plan upgrades around this constraint
- Case opens **tool-free** via a single top-cover latch; very serviceable for the form factor
- CPU cooler mounts to the **chassis** not the board — must be installed after the board is seated
- Dual-channel RAM requires same-colour DIMM pairing (Ch. A + Ch. B)
- GTX 1050 Ti (low-profile) is the practical GPU ceiling for this chassis without a riser
- Thermals are borderline under sustained 3D load — consider improved case airflow or undervolting for homelab/compute use
- For homelab use (Proxmox, LXCs), thermal load is far lighter — see [[wiki/homelab/hp-elitedesk-800g3-proxmox|HP Elitedesk 800 G3 — Proxmox Setup Log]]
---
## Related Articles
- [[wiki/homelab/hp-elitedesk-800g3-proxmox|HP Elitedesk 800 G3 — Proxmox Setup Log]]
- [[wiki/homelab/homelab-from-scratch-budget-build|Homelab From Scratch — Budget-First Design]]
- [[wiki/homelab/bigibz1-homelab-hardware|bigibz1 Homelab Hardware Reference]]
- [[wiki/homelab/homelab-services-map|Homelab — Full Services Map & Network Reference]]
---
## Sources
- [YouTube: HP EliteDesk 800 G3 SFF — Teardown, re-assembly and upgrade (jensd_be, 2021-03-08)](https://www.youtube.com/watch?v=n1ETa3mJ85I)