LM Studio Structured Output

Enforce a specific JSON shape on LLM responses by passing a JSON schema to /v1/chat/completions. Compatible with OpenAI's Structured Output API format.

How It Works

Add a response_format field to the chat completions request
Provide a json_schema with a name, optional strict, and a schema object
The model is constrained to return valid JSON matching that schema
Response arrives as a string in choices[0].message.content — parse it with json.loads()

Server Setup

lms server start
# or enable from Developer tab in LM Studio UI

Install the CLI first if needed:

npx lmstudio install-cli

request_format Shape

"response_format": {
  "type": "json_schema",
  "json_schema": {
    "name": "my_schema",
    "strict": "true",
    "schema": {
      "type": "object",
      "properties": {
        "field": { "type": "string" }
      },
      "required": ["field"]
    }
  }
}

cURL Example

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "{{model}}",
    "messages": [
      {"role": "system", "content": "You are a helpful jokester."},
      {"role": "user", "content": "Tell me a joke."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "joke_response",
        "strict": "true",
        "schema": {
          "type": "object",
          "properties": { "joke": {"type": "string"} },
          "required": ["joke"]
        }
      }
    },
    "temperature": 0.7,
    "max_tokens": 50,
    "stream": false
  }'

Python Example

from openai import OpenAI
import json

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

character_schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "characters",
        "schema": {
            "type": "object",
            "properties": {
                "characters": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "occupation": {"type": "string"},
                            "personality": {"type": "string"},
                            "background": {"type": "string"}
                        },
                        "required": ["name", "occupation", "personality", "background"]
                    },
                    "minItems": 1
                }
            },
            "required": ["characters"]
        }
    }
}

response = client.chat.completions.create(
    model="your-model",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Create 1-3 fictional characters"}
    ],
    response_format=character_schema,
)

results = json.loads(response.choices[0].message.content)
print(json.dumps(results, indent=2))

Structured Output Engines

Model Format	Engine
GGUF	`llama.cpp` grammar-based sampling
MLX	Outlines via lmstudio-ai/mlx-engine

Key Takeaways

Use response_format.type = "json_schema" — same shape as OpenAI's Structured Outputs API
Works with any OpenAI-compatible client SDK (Python, TS, etc.) just by pointing base_url at localhost
Response is always a string in choices[0].message.content — always call json.loads() on it
Not all models support this: models below 7B parameters often cannot do structured output — check the model card
GGUF uses grammar sampling; MLX uses Outlines — both constrain tokens at generation time, not post-hoc
All standard /v1/chat/completions params (temperature, max_tokens, stream, etc.) still apply

wiki/claude-code/lmstudio-chat-completions — full parameter reference for the completions endpoint
wiki/claude-code/lmstudio-openai-compat-endpoints — overview of all OpenAI-compat endpoints
wiki/claude-code/lmstudio-responses-api — stateful responses with streaming and Remote MCP tools
wiki/claude-code/lmstudio-rest-api — native LM Studio API and endpoint feature comparison

4.6 KiB Raw Permalink Blame History

LM Studio Structured Output

How It Works

Server Setup

request_format Shape

cURL Example

Python Example

Structured Output Engines

Key Takeaways

Related

4.6 KiB

Raw Permalink Blame History