obsidian/wiki/claude-code/lmstudio-structured-output.md
2026-04-30 14:42:43 +01:00

4.6 KiB

title aliases tags sources created updated
LM Studio Structured Output
lmstudio-json-schema
structured-output-lmstudio
lmstudio
structured-output
json-schema
openai-compat
local-llm
raw/Structured Output.md
2026-04-30 2026-04-30

LM Studio Structured Output

Enforce a specific JSON shape on LLM responses by passing a JSON schema to /v1/chat/completions. Compatible with OpenAI's Structured Output API format.

How It Works

  • Add a response_format field to the chat completions request
  • Provide a json_schema with a name, optional strict, and a schema object
  • The model is constrained to return valid JSON matching that schema
  • Response arrives as a string in choices[0].message.content — parse it with json.loads()

Server Setup

lms server start
# or enable from Developer tab in LM Studio UI

Install the CLI first if needed:

npx lmstudio install-cli

request_format Shape

"response_format": {
  "type": "json_schema",
  "json_schema": {
    "name": "my_schema",
    "strict": "true",
    "schema": {
      "type": "object",
      "properties": {
        "field": { "type": "string" }
      },
      "required": ["field"]
    }
  }
}

cURL Example

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "{{model}}",
    "messages": [
      {"role": "system", "content": "You are a helpful jokester."},
      {"role": "user", "content": "Tell me a joke."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "joke_response",
        "strict": "true",
        "schema": {
          "type": "object",
          "properties": { "joke": {"type": "string"} },
          "required": ["joke"]
        }
      }
    },
    "temperature": 0.7,
    "max_tokens": 50,
    "stream": false
  }'

Python Example

from openai import OpenAI
import json

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

character_schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "characters",
        "schema": {
            "type": "object",
            "properties": {
                "characters": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "occupation": {"type": "string"},
                            "personality": {"type": "string"},
                            "background": {"type": "string"}
                        },
                        "required": ["name", "occupation", "personality", "background"]
                    },
                    "minItems": 1
                }
            },
            "required": ["characters"]
        }
    }
}

response = client.chat.completions.create(
    model="your-model",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Create 1-3 fictional characters"}
    ],
    response_format=character_schema,
)

results = json.loads(response.choices[0].message.content)
print(json.dumps(results, indent=2))

Structured Output Engines

Model Format Engine
GGUF llama.cpp grammar-based sampling
MLX Outlines via lmstudio-ai/mlx-engine

Key Takeaways

  • Use response_format.type = "json_schema" — same shape as OpenAI's Structured Outputs API
  • Works with any OpenAI-compatible client SDK (Python, TS, etc.) just by pointing base_url at localhost
  • Response is always a string in choices[0].message.content — always call json.loads() on it
  • Not all models support this: models below 7B parameters often cannot do structured output — check the model card
  • GGUF uses grammar sampling; MLX uses Outlines — both constrain tokens at generation time, not post-hoc
  • All standard /v1/chat/completions params (temperature, max_tokens, stream, etc.) still apply