obsidian/wiki/claude-code/lmstudio-responses-api.md
2026-04-30 14:42:43 +01:00

3.6 KiB

title aliases tags sources created updated
LM Studio Responses API
lmstudio-responses
lm-studio-openai-responses
lm-studio
openai-compat
responses-api
streaming
mcp
reasoning
raw/Responses.md
2026-04-30 2026-04-30

LM Studio Responses API

LM Studio exposes /v1/responses — an OpenAI Responses API-compatible endpoint with support for streaming, reasoning effort, stateful multi-turn via previous_response_id, and Remote MCP tools.

Base URL: http://localhost:1234/v1/responses


Basic Request (non-streaming)

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Provide a prime number less than 50",
    "reasoning": { "effort": "low" }
  }'
  • input — plain string prompt (no messages array required)
  • reasoning.effort"low" | "medium" | "high" (model-dependent)

Stateful Follow-up

Carry conversation state across calls using previous_response_id:

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Multiply it by 2",
    "previous_response_id": "resp_123"
  }'
  • The id field from any prior response becomes the previous_response_id of the next
  • No need to replay the full message history client-side

Streaming

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Hello",
    "stream": true
  }'

SSE events emitted:

Event Description
response.created Response object initialised
response.output_text.delta Incremental text chunk
response.completed Final event, full response included

Remote MCP Tools (opt-in)

Enable in LM Studio: Developer → Settings → Remote MCP.

curl http://localhost:1234/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ibm/granite-4-micro",
    "input": "What is the top trending model on hugging face?",
    "tools": [
      {
        "type": "mcp",
        "server_label": "huggingface",
        "server_url": "https://huggingface.co/mcp",
        "allowed_tools": ["model_search"]
      }
    ]
  }'
  • server_label — arbitrary identifier for this MCP server
  • server_url — remote MCP server URL
  • allowed_tools — allowlist of tool names the model may call

Key Takeaways

  • /v1/responses is an OpenAI Responses API drop-in; swap base URL only
  • previous_response_id enables multi-turn without replaying history — simpler than maintaining a messages array
  • Streaming uses standard SSE; listen for response.output_text.delta for incremental chunks
  • Remote MCP tools are per-request and opt-in — must enable the feature in LM Studio settings first
  • reasoning.effort controls thinking depth; not all models support it


Sources

  • raw/Responses.md — LM Studio developer docs: /v1/responses endpoint