obsidian/wiki/claude-code/lmstudio-structured-output.md

---
title: "LM Studio Structured Output"
aliases: [lmstudio-json-schema, structured-output-lmstudio]
tags: [lmstudio, structured-output, json-schema, openai-compat, local-llm]
sources: [raw/Structured Output.md]
created: 2026-04-30
updated: 2026-04-30
---

# LM Studio Structured Output

Enforce a specific JSON shape on LLM responses by passing a JSON schema to `/v1/chat/completions`. Compatible with OpenAI's Structured Output API format.

## How It Works

- Add a `response_format` field to the chat completions request
- Provide a `json_schema` with a `name`, optional `strict`, and a `schema` object
- The model is constrained to return valid JSON matching that schema
- Response arrives as a string in `choices[0].message.content` — parse it with `json.loads()`

## Server Setup

```bash
lms server start
# or enable from Developer tab in LM Studio UI
```

Install the CLI first if needed:
```bash
npx lmstudio install-cli
```

## request_format Shape

```json
"response_format": {
  "type": "json_schema",
  "json_schema": {
    "name": "my_schema",
    "strict": "true",
    "schema": {
      "type": "object",
      "properties": {
        "field": { "type": "string" }
      },
      "required": ["field"]
    }
  }
}
```

## cURL Example

```bash
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "{{model}}",
    "messages": [
      {"role": "system", "content": "You are a helpful jokester."},
      {"role": "user", "content": "Tell me a joke."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "joke_response",
        "strict": "true",
        "schema": {
          "type": "object",
          "properties": { "joke": {"type": "string"} },
          "required": ["joke"]
        }
      }
    },
    "temperature": 0.7,
    "max_tokens": 50,
    "stream": false
  }'
```

## Python Example

```python
from openai import OpenAI
import json

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

character_schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "characters",
        "schema": {
            "type": "object",
            "properties": {
                "characters": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "occupation": {"type": "string"},
                            "personality": {"type": "string"},
                            "background": {"type": "string"}
                        },
                        "required": ["name", "occupation", "personality", "background"]
                    },
                    "minItems": 1
                }
            },
            "required": ["characters"]
        }
    }
}

response = client.chat.completions.create(
    model="your-model",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Create 1-3 fictional characters"}
    ],
    response_format=character_schema,
)

results = json.loads(response.choices[0].message.content)
print(json.dumps(results, indent=2))
```

## Structured Output Engines

| Model Format | Engine |
|---|---|
| GGUF | `llama.cpp` grammar-based sampling |
| MLX | [Outlines](https://github.com/dottxt-ai/outlines) via [lmstudio-ai/mlx-engine](https://github.com/lmstudio-ai/mlx-engine) |

## Key Takeaways

- Use `response_format.type = "json_schema"` — same shape as OpenAI's Structured Outputs API
- Works with any OpenAI-compatible client SDK (Python, TS, etc.) just by pointing `base_url` at localhost
- Response is always a **string** in `choices[0].message.content` — always call `json.loads()` on it
- Not all models support this: **models below 7B parameters often cannot do structured output** — check the model card
- GGUF uses grammar sampling; MLX uses Outlines — both constrain tokens at generation time, not post-hoc
- All standard `/v1/chat/completions` params (temperature, max_tokens, stream, etc.) still apply

## Related

- [[wiki/claude-code/lmstudio-chat-completions|lmstudio-chat-completions]] — full parameter reference for the completions endpoint
- [[wiki/claude-code/lmstudio-openai-compat-endpoints|lmstudio-openai-compat-endpoints]] — overview of all OpenAI-compat endpoints
- [[wiki/claude-code/lmstudio-responses-api|lmstudio-responses-api]] — stateful responses with streaming and Remote MCP tools
- [[wiki/claude-code/lmstudio-rest-api|lmstudio-rest-api]] — native LM Studio API and endpoint feature comparison