--- title: "LM Studio Structured Output" aliases: [lmstudio-json-schema, structured-output-lmstudio] tags: [lmstudio, structured-output, json-schema, openai-compat, local-llm] sources: [raw/Structured Output.md] created: 2026-04-30 updated: 2026-04-30 --- # LM Studio Structured Output Enforce a specific JSON shape on LLM responses by passing a JSON schema to `/v1/chat/completions`. Compatible with OpenAI's Structured Output API format. ## How It Works - Add a `response_format` field to the chat completions request - Provide a `json_schema` with a `name`, optional `strict`, and a `schema` object - The model is constrained to return valid JSON matching that schema - Response arrives as a string in `choices[0].message.content` — parse it with `json.loads()` ## Server Setup ```bash lms server start # or enable from Developer tab in LM Studio UI ``` Install the CLI first if needed: ```bash npx lmstudio install-cli ``` ## request_format Shape ```json "response_format": { "type": "json_schema", "json_schema": { "name": "my_schema", "strict": "true", "schema": { "type": "object", "properties": { "field": { "type": "string" } }, "required": ["field"] } } } ``` ## cURL Example ```bash curl http://localhost:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "{{model}}", "messages": [ {"role": "system", "content": "You are a helpful jokester."}, {"role": "user", "content": "Tell me a joke."} ], "response_format": { "type": "json_schema", "json_schema": { "name": "joke_response", "strict": "true", "schema": { "type": "object", "properties": { "joke": {"type": "string"} }, "required": ["joke"] } } }, "temperature": 0.7, "max_tokens": 50, "stream": false }' ``` ## Python Example ```python from openai import OpenAI import json client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio") character_schema = { "type": "json_schema", "json_schema": { "name": "characters", "schema": { "type": "object", "properties": { "characters": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "occupation": {"type": "string"}, "personality": {"type": "string"}, "background": {"type": "string"} }, "required": ["name", "occupation", "personality", "background"] }, "minItems": 1 } }, "required": ["characters"] } } } response = client.chat.completions.create( model="your-model", messages=[ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Create 1-3 fictional characters"} ], response_format=character_schema, ) results = json.loads(response.choices[0].message.content) print(json.dumps(results, indent=2)) ``` ## Structured Output Engines | Model Format | Engine | |---|---| | GGUF | `llama.cpp` grammar-based sampling | | MLX | [Outlines](https://github.com/dottxt-ai/outlines) via [lmstudio-ai/mlx-engine](https://github.com/lmstudio-ai/mlx-engine) | ## Key Takeaways - Use `response_format.type = "json_schema"` — same shape as OpenAI's Structured Outputs API - Works with any OpenAI-compatible client SDK (Python, TS, etc.) just by pointing `base_url` at localhost - Response is always a **string** in `choices[0].message.content` — always call `json.loads()` on it - Not all models support this: **models below 7B parameters often cannot do structured output** — check the model card - GGUF uses grammar sampling; MLX uses Outlines — both constrain tokens at generation time, not post-hoc - All standard `/v1/chat/completions` params (temperature, max_tokens, stream, etc.) still apply ## Related - [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] — full parameter reference for the completions endpoint - [[wiki/claude-code/lmstudio-openai-compat-endpoints|lmstudio-openai-compat-endpoints]] — overview of all OpenAI-compat endpoints - [[wiki/claude-code/lmstudio-responses-api|lmstudio-responses-api]] — stateful responses with streaming and Remote MCP tools - [[wiki/claude-code/lmstudio-rest-api|lmstudio-rest-api]] — native LM Studio API and endpoint feature comparison