LM Studio Responses API

LM Studio exposes /v1/responses — an OpenAI Responses API-compatible endpoint with support for streaming, reasoning effort, stateful multi-turn via previous_response_id, and Remote MCP tools.

Base URL: http://localhost:1234/v1/responses

Basic Request (non-streaming)

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Provide a prime number less than 50",
    "reasoning": { "effort": "low" }
  }'

input — plain string prompt (no messages array required)
reasoning.effort — "low" | "medium" | "high" (model-dependent)

Stateful Follow-up

Carry conversation state across calls using previous_response_id:

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Multiply it by 2",
    "previous_response_id": "resp_123"
  }'

The id field from any prior response becomes the previous_response_id of the next
No need to replay the full message history client-side

Streaming

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Hello",
    "stream": true
  }'

SSE events emitted:

Event	Description
`response.created`	Response object initialised
`response.output_text.delta`	Incremental text chunk
`response.completed`	Final event, full response included

Remote MCP Tools (opt-in)

Enable in LM Studio: Developer → Settings → Remote MCP.

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ibm/granite-4-micro",
    "input": "What is the top trending model on hugging face?",
    "tools": [
      {
        "type": "mcp",
        "server_label": "huggingface",
        "server_url": "https://huggingface.co/mcp",
        "allowed_tools": ["model_search"]
      }
    ]
  }'

server_label — arbitrary identifier for this MCP server
server_url — remote MCP server URL
allowed_tools — allowlist of tool names the model may call

Key Takeaways

/v1/responses is an OpenAI Responses API drop-in; swap base URL only
previous_response_id enables multi-turn without replaying history — simpler than maintaining a messages array
Streaming uses standard SSE; listen for response.output_text.delta for incremental chunks
Remote MCP tools are per-request and opt-in — must enable the feature in LM Studio settings first
reasoning.effort controls thinking depth; not all models support it

wiki/claude-code/lmstudio-openai-compat-endpoints — overview of all 5 OAI-compatible endpoints
wiki/claude-code/lmstudio-chat-completions — /v1/chat/completions with full param reference
wiki/claude-code/lmstudio-messages-api — /v1/messages Anthropic-compat with streaming + tool-use
wiki/claude-code/lmstudio-rest-api — native endpoint feature comparison table
wiki/claude-code/mcp-integration — Claude Code MCP setup and server patterns

Sources

raw/Responses.md — LM Studio developer docs: /v1/responses endpoint

3.6 KiB Raw Blame History