obsidian/wiki/claude-code/lmstudio-responses-api.md
2026-04-30 14:42:43 +01:00

124 lines
3.6 KiB
Markdown

---
title: "LM Studio Responses API"
aliases: [lmstudio-responses, lm-studio-openai-responses]
tags: [lm-studio, openai-compat, responses-api, streaming, mcp, reasoning]
sources: [raw/Responses.md]
created: 2026-04-30
updated: 2026-04-30
---
# LM Studio Responses API
LM Studio exposes `/v1/responses` — an OpenAI Responses API-compatible endpoint with support for streaming, reasoning effort, stateful multi-turn via `previous_response_id`, and Remote MCP tools.
Base URL: `http://localhost:1234/v1/responses`
---
## Basic Request (non-streaming)
```bash
curl http://localhost:1234/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b",
"input": "Provide a prime number less than 50",
"reasoning": { "effort": "low" }
}'
```
- `input` — plain string prompt (no messages array required)
- `reasoning.effort``"low"` | `"medium"` | `"high"` (model-dependent)
---
## Stateful Follow-up
Carry conversation state across calls using `previous_response_id`:
```bash
curl http://localhost:1234/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b",
"input": "Multiply it by 2",
"previous_response_id": "resp_123"
}'
```
- The `id` field from any prior response becomes the `previous_response_id` of the next
- No need to replay the full message history client-side
---
## Streaming
```bash
curl http://localhost:1234/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b",
"input": "Hello",
"stream": true
}'
```
SSE events emitted:
| Event | Description |
|-------|-------------|
| `response.created` | Response object initialised |
| `response.output_text.delta` | Incremental text chunk |
| `response.completed` | Final event, full response included |
---
## Remote MCP Tools (opt-in)
Enable in LM Studio: **Developer → Settings → Remote MCP**.
```bash
curl http://localhost:1234/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"tools": [
{
"type": "mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
]
}'
```
- `server_label` — arbitrary identifier for this MCP server
- `server_url` — remote MCP server URL
- `allowed_tools` — allowlist of tool names the model may call
---
## Key Takeaways
- `/v1/responses` is an OpenAI Responses API drop-in; swap base URL only
- `previous_response_id` enables multi-turn without replaying history — simpler than maintaining a messages array
- Streaming uses standard SSE; listen for `response.output_text.delta` for incremental chunks
- Remote MCP tools are per-request and opt-in — must enable the feature in LM Studio settings first
- `reasoning.effort` controls thinking depth; not all models support it
---
## Related
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — overview of all 5 OAI-compatible endpoints
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — `/v1/chat/completions` with full param reference
- [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — `/v1/messages` Anthropic-compat with streaming + tool-use
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native endpoint feature comparison table
- [[wiki/claude-code/mcp-integration|MCP Integration]] — Claude Code MCP setup and server patterns
---
## Sources
- `raw/Responses.md` — LM Studio developer docs: `/v1/responses` endpoint