---
title: "LM Studio Responses API"
aliases: [lmstudio-responses, lm-studio-openai-responses]
tags: [lm-studio, openai-compat, responses-api, streaming, mcp, reasoning]
sources: [raw/Responses.md]
created: 2026-04-30
updated: 2026-04-30
---

# LM Studio Responses API

LM Studio exposes `/v1/responses` — an OpenAI Responses API-compatible endpoint with support for streaming, reasoning effort, stateful multi-turn via `previous_response_id`, and Remote MCP tools.

Base URL: `http://localhost:1234/v1/responses`

---

## Basic Request (non-streaming)

```bash
curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Provide a prime number less than 50",
    "reasoning": { "effort": "low" }
  }'
```

- `input` — plain string prompt (no messages array required)
- `reasoning.effort` — `"low"` | `"medium"` | `"high"` (model-dependent)

---

## Stateful Follow-up

Carry conversation state across calls using `previous_response_id`:

```bash
curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Multiply it by 2",
    "previous_response_id": "resp_123"
  }'
```

- The `id` field from any prior response becomes the `previous_response_id` of the next
- No need to replay the full message history client-side

---

## Streaming

```bash
curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Hello",
    "stream": true
  }'
```

SSE events emitted:
| Event | Description |
|-------|-------------|
| `response.created` | Response object initialised |
| `response.output_text.delta` | Incremental text chunk |
| `response.completed` | Final event, full response included |

---

## Remote MCP Tools (opt-in)

Enable in LM Studio: **Developer → Settings → Remote MCP**.

```bash
curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ibm/granite-4-micro",
    "input": "What is the top trending model on hugging face?",
    "tools": [
      {
        "type": "mcp",
        "server_label": "huggingface",
        "server_url": "https://huggingface.co/mcp",
        "allowed_tools": ["model_search"]
      }
    ]
  }'
```

- `server_label` — arbitrary identifier for this MCP server
- `server_url` — remote MCP server URL
- `allowed_tools` — allowlist of tool names the model may call

---

## Key Takeaways

- `/v1/responses` is an OpenAI Responses API drop-in; swap base URL only
- `previous_response_id` enables multi-turn without replaying history — simpler than maintaining a messages array
- Streaming uses standard SSE; listen for `response.output_text.delta` for incremental chunks
- Remote MCP tools are per-request and opt-in — must enable the feature in LM Studio settings first
- `reasoning.effort` controls thinking depth; not all models support it

---

## Related

- [[wiki/claude-code/lmstudio-openai-compat-endpoints|LM Studio OpenAI Compat Endpoints]] — overview of all 5 OAI-compatible endpoints
- [[wiki/claude-code/lmstudio-chat-completions|LM Studio Chat Completions]] — `/v1/chat/completions` with full param reference
- [[wiki/claude-code/lmstudio-messages-api|LM Studio Messages API]] — `/v1/messages` Anthropic-compat with streaming + tool-use
- [[wiki/claude-code/lmstudio-rest-api|LM Studio REST API]] — native endpoint feature comparison table
- [[wiki/claude-code/mcp-integration|MCP Integration]] — Claude Code MCP setup and server patterns

---

## Sources

- `raw/Responses.md` — LM Studio developer docs: `/v1/responses` endpoint