75 lines
3.7 KiB
Markdown
75 lines
3.7 KiB
Markdown
---
|
|
title: "LM Studio REST API (v1)"
|
|
aliases: [lmstudio-api, lm-studio-rest, lmstudio-v1]
|
|
tags: [lmstudio, rest-api, local-inference, openai-compat, anthropic-compat, mcp]
|
|
sources: [raw/LM Studio API.md]
|
|
created: 2026-04-30
|
|
updated: 2026-04-30
|
|
---
|
|
|
|
# LM Studio REST API (v1)
|
|
|
|
LM Studio 0.4.0 introduced the native **v1 REST API** at `/api/v1/*`. It sits alongside OpenAI-compatible and Anthropic-compatible endpoints and offers the richest feature set for local inference.
|
|
|
|
## v1 vs v0
|
|
|
|
The old v0 API (`/api/v0/*`) is superseded. Migrate to `/api/v1/*` for:
|
|
|
|
- **Stateful chats** — server keeps conversation context across turns
|
|
- **MCP via API** — use MCPs configured in LM Studio directly from requests
|
|
- **Authentication** — API token support
|
|
- **Model management** — download, load, unload via API
|
|
|
|
## Supported Endpoints
|
|
|
|
| Endpoint | Method | Purpose |
|
|
|---|---|---|
|
|
| `/api/v1/chat` | POST | Inference (native) |
|
|
| `/api/v1/models` | GET | List loaded models |
|
|
| `/api/v1/models/load` | POST | Load a model into VRAM |
|
|
| `/api/v1/models/unload` | POST | Unload a model |
|
|
| `/api/v1/models/download` | POST | Download a model |
|
|
| `/api/v1/models/download/status` | GET | Poll download progress |
|
|
|
|
## Inference Endpoint Comparison
|
|
|
|
Four endpoints can run inference. Pick based on which features you need:
|
|
|
|
| Feature | `/api/v1/chat` | `/v1/responses` (OAI) | `/v1/chat/completions` (OAI) | `/v1/messages` (Anthropic) |
|
|
|---|:---:|:---:|:---:|:---:|
|
|
| Streaming | ✅ | ✅ | ✅ | ✅ |
|
|
| Stateful chat | ✅ | ✅ | ❌ | ❌ |
|
|
| Remote MCPs | ✅ | ✅ | ❌ | ❌ |
|
|
| LM Studio MCPs | ✅ | ✅ | ❌ | ❌ |
|
|
| Custom tools | ❌ | ✅ | ✅ | ✅ |
|
|
| Assistant messages in request | ❌ | ✅ | ✅ | ✅ |
|
|
| Model load streaming events | ✅ | ❌ | ❌ | ❌ |
|
|
| Prompt processing events | ✅ | ❌ | ❌ | ❌ |
|
|
| Specify context length | ✅ | ❌ | ❌ | ❌ |
|
|
|
|
**Decision guide:**
|
|
- Need MCP tools + stateful chat → `/api/v1/chat` or `/v1/responses`
|
|
- Need custom tool definitions → `/v1/responses`, `/v1/chat/completions`, or `/v1/messages`
|
|
- Dropping in existing OpenAI SDK code → `/v1/chat/completions`
|
|
- Dropping in existing Anthropic SDK code → `/v1/messages`
|
|
|
|
## Key Takeaways
|
|
|
|
- The **native `/api/v1/chat`** endpoint has exclusive features: stateful chat, LM Studio MCPs, model-load events, prompt-processing events, and per-request context length.
|
|
- **`/v1/responses`** (OpenAI Responses API compat) is the best of both worlds — stateful + MCP + custom tools.
|
|
- **`/v1/chat/completions`** is the broadest drop-in for existing OpenAI code but loses statefulness and MCP.
|
|
- **`/v1/messages`** lets you redirect the Anthropic SDK to a local model with minimal code change (see [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]]).
|
|
- Model management endpoints let you fully automate the model lifecycle — download → load → infer → unload — without touching the GUI.
|
|
- API token auth is available for securing the local server (useful when exposed on a LAN).
|
|
|
|
## Related Articles
|
|
|
|
- [[wiki/claude-code/lmstudio-anthropic-compat|lmstudio-anthropic-compat]] — redirect Claude Code / Anthropic SDK to LM Studio via env vars
|
|
- [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] — OpenAI `/v1/chat/completions` usage, params, debugging
|
|
- [[wiki/claude-code/lmstudio-embeddings|lmstudio-embeddings]] — `/v1/embeddings` for local RAG pipelines
|
|
- [[wiki/claude-code/lmstudio-idle-ttl-auto-evict|lmstudio-idle-ttl-auto-evict]] — memory management: TTL and auto-evict
|
|
- [[wiki/agent-sdk/overview|agent-sdk/overview]] — build multi-agent systems that call local models
|
|
|
|
## Sources
|
|
|
|
- `raw/LM Studio API.md` — clipped from lmstudio.ai/docs/developer/rest
|