--- title: "LM Studio Responses API" aliases: [lmstudio-responses, lm-studio-openai-responses] tags: [lm-studio, openai-compat, responses-api, streaming, mcp, reasoning] sources: [raw/Responses.md] created: 2026-04-30 updated: 2026-04-30 --- # LM Studio Responses API LM Studio exposes `/v1/responses` — an OpenAI Responses API-compatible endpoint with support for streaming, reasoning effort, stateful multi-turn via `previous_response_id`, and Remote MCP tools. Base URL: `http://localhost:1234/v1/responses` --- ## Basic Request (non-streaming) ```bash curl http://localhost:1234/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-oss-20b", "input": "Provide a prime number less than 50", "reasoning": { "effort": "low" } }' ``` - `input` — plain string prompt (no messages array required) - `reasoning.effort` — `"low"` | `"medium"` | `"high"` (model-dependent) --- ## Stateful Follow-up Carry conversation state across calls using `previous_response_id`: ```bash curl http://localhost:1234/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-oss-20b", "input": "Multiply it by 2", "previous_response_id": "resp_123" }' ``` - The `id` field from any prior response becomes the `previous_response_id` of the next - No need to replay the full message history client-side --- ## Streaming ```bash curl http://localhost:1234/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-oss-20b", "input": "Hello", "stream": true }' ``` SSE events emitted: | Event | Description | |-------|-------------| | `response.created` | Response object initialised | | `response.output_text.delta` | Incremental text chunk | | `response.completed` | Final event, full response included | --- ## Remote MCP Tools (opt-in) Enable in LM Studio: **Developer → Settings → Remote MCP**. ```bash curl http://localhost:1234/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "ibm/granite-4-micro", "input": "What is the top trending model on hugging face?", "tools": [ { "type": "mcp", "server_label": "huggingface", "server_url": "https://huggingface.co/mcp", "allowed_tools": ["model_search"] } ] }' ``` - `server_label` — arbitrary identifier for this MCP server - `server_url` — remote MCP server URL - `allowed_tools` — allowlist of tool names the model may call --- ## Key Takeaways - `/v1/responses` is an OpenAI Responses API drop-in; swap base URL only - `previous_response_id` enables multi-turn without replaying history — simpler than maintaining a messages array - Streaming uses standard SSE; listen for `response.output_text.delta` for incremental chunks - Remote MCP tools are per-request and opt-in — must enable the feature in LM Studio settings first - `reasoning.effort` controls thinking depth; not all models support it --- ## Related - [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — overview of all 5 OAI-compatible endpoints - [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — `/v1/chat/completions` with full param reference - [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — `/v1/messages` Anthropic-compat with streaming + tool-use - [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native endpoint feature comparison table - [[wiki/claude-code/mcp-integration|MCP Integration]] — Claude Code MCP setup and server patterns --- ## Sources - `raw/Responses.md` — LM Studio developer docs: `/v1/responses` endpoint