124 lines
3.6 KiB
Markdown
124 lines
3.6 KiB
Markdown
---
|
|
title: "LM Studio Responses API"
|
|
aliases: [lmstudio-responses, lm-studio-openai-responses]
|
|
tags: [lm-studio, openai-compat, responses-api, streaming, mcp, reasoning]
|
|
sources: [raw/Responses.md]
|
|
created: 2026-04-30
|
|
updated: 2026-04-30
|
|
---
|
|
|
|
# LM Studio Responses API
|
|
|
|
LM Studio exposes `/v1/responses` — an OpenAI Responses API-compatible endpoint with support for streaming, reasoning effort, stateful multi-turn via `previous_response_id`, and Remote MCP tools.
|
|
|
|
Base URL: `http://localhost:1234/v1/responses`
|
|
|
|
---
|
|
|
|
## Basic Request (non-streaming)
|
|
|
|
```bash
|
|
curl http://localhost:1234/v1/responses \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "openai/gpt-oss-20b",
|
|
"input": "Provide a prime number less than 50",
|
|
"reasoning": { "effort": "low" }
|
|
}'
|
|
```
|
|
|
|
- `input` — plain string prompt (no messages array required)
|
|
- `reasoning.effort` — `"low"` | `"medium"` | `"high"` (model-dependent)
|
|
|
|
---
|
|
|
|
## Stateful Follow-up
|
|
|
|
Carry conversation state across calls using `previous_response_id`:
|
|
|
|
```bash
|
|
curl http://localhost:1234/v1/responses \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "openai/gpt-oss-20b",
|
|
"input": "Multiply it by 2",
|
|
"previous_response_id": "resp_123"
|
|
}'
|
|
```
|
|
|
|
- The `id` field from any prior response becomes the `previous_response_id` of the next
|
|
- No need to replay the full message history client-side
|
|
|
|
---
|
|
|
|
## Streaming
|
|
|
|
```bash
|
|
curl http://localhost:1234/v1/responses \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "openai/gpt-oss-20b",
|
|
"input": "Hello",
|
|
"stream": true
|
|
}'
|
|
```
|
|
|
|
SSE events emitted:
|
|
| Event | Description |
|
|
|-------|-------------|
|
|
| `response.created` | Response object initialised |
|
|
| `response.output_text.delta` | Incremental text chunk |
|
|
| `response.completed` | Final event, full response included |
|
|
|
|
---
|
|
|
|
## Remote MCP Tools (opt-in)
|
|
|
|
Enable in LM Studio: **Developer → Settings → Remote MCP**.
|
|
|
|
```bash
|
|
curl http://localhost:1234/v1/responses \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ibm/granite-4-micro",
|
|
"input": "What is the top trending model on hugging face?",
|
|
"tools": [
|
|
{
|
|
"type": "mcp",
|
|
"server_label": "huggingface",
|
|
"server_url": "https://huggingface.co/mcp",
|
|
"allowed_tools": ["model_search"]
|
|
}
|
|
]
|
|
}'
|
|
```
|
|
|
|
- `server_label` — arbitrary identifier for this MCP server
|
|
- `server_url` — remote MCP server URL
|
|
- `allowed_tools` — allowlist of tool names the model may call
|
|
|
|
---
|
|
|
|
## Key Takeaways
|
|
|
|
- `/v1/responses` is an OpenAI Responses API drop-in; swap base URL only
|
|
- `previous_response_id` enables multi-turn without replaying history — simpler than maintaining a messages array
|
|
- Streaming uses standard SSE; listen for `response.output_text.delta` for incremental chunks
|
|
- Remote MCP tools are per-request and opt-in — must enable the feature in LM Studio settings first
|
|
- `reasoning.effort` controls thinking depth; not all models support it
|
|
|
|
---
|
|
|
|
## Related
|
|
|
|
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — overview of all 5 OAI-compatible endpoints
|
|
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — `/v1/chat/completions` with full param reference
|
|
- [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — `/v1/messages` Anthropic-compat with streaming + tool-use
|
|
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native endpoint feature comparison table
|
|
- [[wiki/claude-code/mcp-integration|MCP Integration]] — Claude Code MCP setup and server patterns
|
|
|
|
---
|
|
|
|
## Sources
|
|
|
|
- `raw/Responses.md` — LM Studio developer docs: `/v1/responses` endpoint
|