How do I force an LLM to return valid JSON?

Use schema-constrained decoding rather than prompting. OpenAI's response_format with json_schema and strict: true masks the token sampler at every step so the model can only emit output that conforms to your schema. Plain JSON mode (json_object) guarantees valid JSON but not a specific shape, so prefer json_schema when you need exact keys and types.

What is the difference between JSON mode and Structured Outputs?

JSON mode (json_object) guarantees the output is syntactically valid JSON but lets the model choose the keys, types, and structure. Structured Outputs (json_schema with strict: true) constrains decoding to a schema you define, so wrong keys, missing fields, and bad enum values become impossible to emit. Use Structured Outputs when shape matters; fall back to JSON mode only for provider portability.

When should I use OpenAI tool calling instead of structured outputs?

Use tool calling when the model's output should drive an action in your code, like rendering UI, querying a database, or triggering a workflow, and the model needs to choose which function to call. Use Structured Outputs when you just need data in a fixed shape returned to the caller. Both rely on the same constrained-decoding machinery and JSON-Schema validation.

All writing

June 19, 2026 8 min read

LLM structured outputs: how to force reliable JSON with schemas and tool calling

LLM structured outputs explained: force reliable JSON from OpenAI with JSON mode, tool calling, and json_schema strict decoding, then validate with Zod or Pydantic.

LLMsAIOpenAI

On this page

The reliability ladder, bottom to top
Why schema-constrained decoding beats regex-parsing a blob
Tool calling: when the model should act, not just answer
When plain JSON mode is still the right tool
Validate anyway — Zod or Pydantic, every time
Handle refusals, truncation, and the retry
What I'd tell my past self
FAQ

You asked the model for JSON. It handed you a friendly paragraph that contains JSON, wrapped in a markdown fence, with a trailing comment and one key spelled differently than you asked. Your JSON.parse throws, the request 500s, and you reach for a regex to claw the object out of the prose. Stop. LLM structured outputs are a solved problem now, and the fix is not better prompting — it's constraining the model's decoding so it cannot emit anything but the shape you specified.

Here's the thesis: there's a ladder of reliability for getting structured data out of an LLM, and most teams are stuck on the bottom rung. Climb it. Each rung trades a little flexibility for a lot of determinism, and the top rung — schema-constrained decoding — is the strong default in 2026.

The reliability ladder, bottom to top

Here's the whole progression in one place, worst to best:

Prompt-and-pray — "respond only with JSON." The model usually complies and occasionally narrates, fences, or apologizes. You parse a blob and hope.
JSON mode — response_format: { type: "json_object" }. The output is now guaranteed valid JSON. But valid isn't the same as correct shape — keys, types, and enums are still up to the model's mood.
Function / tool calling — you describe a tool with a JSON-Schema parameter spec; the model returns arguments meant to fit it. Strongly shaped, and the natural fit when the model should choose an action.
Structured Outputs (json_schema with strict: true) — the decoder is constrained to your schema. Not "asked nicely" — constrained. Wrong keys and wrong types become literally impossible to emit.

The jump that matters is from "parse a blob" to "the model can't emit a non-conforming blob in the first place." Let me explain why that jump is real and not marketing.

Why schema-constrained decoding beats regex-parsing a blob

An LLM generates one token at a time, sampling from a probability distribution over the vocabulary. Constrained decoding (also called grammar-constrained or structured decoding) masks that distribution at every step: given your schema, only tokens that keep the output a valid prefix of a conforming document are allowed. After {", the sampler can only pick tokens that start one of your declared keys. After a key typed as integer, it can only emit digits. The model never gets the chance to write prose, a markdown fence, or a misspelled key.

Contrast that with regex-parsing a free-form blob. You're doing error detection after the fact, on the one output you got, with no second chance short of a retry. Constrained decoding is error prevention during generation. That's the difference between a type system and a runtime try/catch — and it's why "force JSON from an LLM" stopped meaning "write a sterner prompt."

Here's json_schema with strict mode on the OpenAI Chat Completions API:

const res = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: text }],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "ticket",
      strict: true, // turns on schema-constrained decoding
      schema: {
        type: "object",
        additionalProperties: false, // no surprise keys
        required: ["priority", "summary", "needs_human"],
        properties: {
          priority: { type: "string", enum: ["low", "med", "high"] },
          summary: { type: "string" },
          needs_human: { type: "boolean" },
        },
      },
    },
  },
});

Two non-negotiables for strict mode: every property must be listed in required, and additionalProperties must be false. (Need a truly optional field? Make it a union with null — strict won't let you simply omit it.) That's the API forcing you to fully specify the shape — which is exactly what you want.

Tool calling: when the model should act, not just answer

Tool calling is the same constrained-decoding machinery pointed at a different job: deciding which function to call and with what arguments. The arguments are validated against the tool's JSON-Schema parameters. Reach for it when the model's output should drive your code — render UI, query a database, trigger a workflow — rather than just return data to a caller.

This is where I'll get concrete, because my portfolio's AI chat uses exactly this pattern in production. The model can call a suggest_actions tool, and the trick is that its argument isn't a free-text string — it's an enum of allowed action ids. The app renders each returned id as a clickable chip.

const tools = [{
  type: "function",
  function: {
    name: "suggest_actions",
    description: "Suggest 1–3 follow-up action chips alongside the reply.",
    parameters: {
      type: "object",
      additionalProperties: false,
      required: ["actions"],
      properties: {
        actions: {
          type: "array",
          maxItems: 3,
          items: {
            type: "string",
            // the model can ONLY pick from these ids:
            enum: ["book-call", "see-work", "ask-pricing", "resume"],
          },
        },
      },
    },
  },
}];

The point isn't the chips — it's the enum. By constraining the output space to a closed set of known ids, the model cannot invent download-brochure or hand me an id my router doesn't recognize. There's no "did it hallucinate a route?" branch to write, because the space it can emit from is closed. That's the whole philosophy of structured outputs in one move: shrink the output space until going off-script is impossible. A free-text "what should the user do next?" would have been a parsing-and-validation chore forever; an enum makes the bad states unrepresentable.

For the why-an-agent-routes-its-own-strategy angle, I dig into letting a model pick its retrieval path in metadata-filtered RAG — same instinct, a layer up.

When plain JSON mode is still the right tool

Not every job needs strict schema decoding, and I want to be honest about where I actually land. My autonomous news-desk agent runs its LLM layer in JSON mode — json_object, not json_schema — because the layer is provider-agnostic across OpenAI, Gemini, and Claude, and plain JSON mode is the common denominator that behaves the same everywhere. It scores, de-duplicates, and ranks candidates into JSON, with per-job token accounting so a pathological scan can't burn the month's budget (the full agent build is here).

The trade-off I accepted: JSON mode guarantees valid JSON but not my shape, so I validate every payload after parsing. Which brings up the rule that survives no matter how high you climb the ladder.

Validate anyway — Zod or Pydantic, every time

Schema-constrained decoding makes malformed output unlikely, not theoretically impossible across every provider, model snapshot, and edge case. The parsed object still crosses a trust boundary into your code, so you validate it like any other untrusted input. In TypeScript that's Zod; in Python it's Pydantic.

import { z } from "zod";

const Ticket = z.object({
  priority: z.enum(["low", "med", "high"]),
  summary: z.string().min(1),
  needs_human: z.boolean(),
});

const parsed = Ticket.safeParse(JSON.parse(res.choices[0].message.content!));
if (!parsed.success) {
  // feed parsed.error back to the model and retry once — don't 500
  throw new RetryableError(parsed.error);
}
return parsed.data; // fully typed, safe to use

Even with the enum-constrained suggest_actions tool above, my server still runs the parsed ids through a sanitizeActions() filter that drops anything not in the catalog and caps the list at three. Belt and suspenders — the decoder and a runtime check. Define the schema once and derive both the API's json_schema and the validator from it (zod-to-json-schema does exactly this), so the model's contract and your runtime check can never drift apart. One source of truth, two consumers.

Handle refusals, truncation, and the retry

Two failure modes survive even strict mode, and ignoring them is how "reliable" demos turn flaky in production:

Refusals. The model can decline rather than fill your schema. With Structured Outputs you get a typed refusal string on the message instead of garbage shoehorned into your shape — check it before you parse. Don't feed a refusal to JSON.parse.
Truncation. If you hit max_tokens mid-object, you get a syntactically broken JSON document. Check finish_reason === "length" and treat it as an error, not a parse bug — raise the token cap or shrink the task.

The retry should be surgical, not blind. On a validation failure, send the model its own broken output plus the specific error and ask it to fix that — don't just re-roll the same prompt and hope the dice land better. And cap retries at one, maybe two: a model that can't satisfy your schema twice usually means the schema or the prompt is wrong, not the weather.

What I'd tell my past self

Constrain the output space; don't sanitize the output. An enum the model picks from beats a regex that scrubs what it wrote. Make bad states unrepresentable.
Use json_schema + strict: true as the default; drop to json_object JSON mode only when you need provider portability, as I did on the news desk.
Tool calling is for actions; Structured Outputs is for data. If the model's job is "choose and do," that's a tool with enum-constrained args.
Validate with Zod/Pydantic regardless, from a schema you defined once, and check refusal and finish_reason before you ever call JSON.parse.

The mental shift is small but total: stop treating the model's output as text you have to interpret, and start treating it as a value you constrained into existence. The most reliable JSON is the JSON the model was never able to break.

Frequently asked questions

How do I force an LLM to return valid JSON?: Use schema-constrained decoding rather than prompting. OpenAI's response_format with json_schema and strict: true masks the token sampler at every step so the model can only emit output that conforms to your schema. Plain JSON mode (json_object) guarantees valid JSON but not a specific shape, so prefer json_schema when you need exact keys and types.
What is the difference between JSON mode and Structured Outputs?: JSON mode (json_object) guarantees the output is syntactically valid JSON but lets the model choose the keys, types, and structure. Structured Outputs (json_schema with strict: true) constrains decoding to a schema you define, so wrong keys, missing fields, and bad enum values become impossible to emit. Use Structured Outputs when shape matters; fall back to JSON mode only for provider portability.
When should I use OpenAI tool calling instead of structured outputs?: Use tool calling when the model's output should drive an action in your code, like rendering UI, querying a database, or triggering a workflow, and the model needs to choose which function to call. Use Structured Outputs when you just need data in a fixed shape returned to the caller. Both rely on the same constrained-decoding machinery and JSON-Schema validation.
Do I still need to validate LLM output if I use a strict schema?: Yes. Schema-constrained decoding makes malformed output unlikely, not impossible across every provider and model snapshot, and the parsed object still crosses a trust boundary into your code. Validate with Zod or Pydantic from a schema you define once, and check the refusal and finish_reason fields before calling JSON.parse.

/ continue reading

June 24, 2026 7 min

Metadata-filtered RAG: two-stage retrieval that stops returning irrelevant chunks

Metadata-filtered RAG fixes single-shot retrieval that returns junk on multi-topic corpora. How I built a metadata pre-filter, vector search, and LLM rerank pipeline.

Read

June 25, 2026 8 min

Model Context Protocol (MCP), explained for people who already build agents

Model Context Protocol explained for agent builders: what an MCP server is, tools vs resources vs prompts, a minimal TypeScript example, and MCP vs tool-calling.

Read

Back to all writing

All writing

June 19, 2026 8 min read

LLM structured outputs: how to force reliable JSON with schemas and tool calling

LLM structured outputs explained: force reliable JSON from OpenAI with JSON mode, tool calling, and json_schema strict decoding, then validate with Zod or Pydantic.

LLMsAIOpenAI

On this page

The reliability ladder, bottom to top
Why schema-constrained decoding beats regex-parsing a blob
Tool calling: when the model should act, not just answer
When plain JSON mode is still the right tool
Validate anyway — Zod or Pydantic, every time
Handle refusals, truncation, and the retry
What I'd tell my past self
FAQ

The reliability ladder, bottom to top

Here's the whole progression in one place, worst to best:

Prompt-and-pray — "respond only with JSON." The model usually complies and occasionally narrates, fences, or apologizes. You parse a blob and hope.
JSON mode — response_format: { type: "json_object" }. The output is now guaranteed valid JSON. But valid isn't the same as correct shape — keys, types, and enums are still up to the model's mood.
Function / tool calling — you describe a tool with a JSON-Schema parameter spec; the model returns arguments meant to fit it. Strongly shaped, and the natural fit when the model should choose an action.
Structured Outputs (json_schema with strict: true) — the decoder is constrained to your schema. Not "asked nicely" — constrained. Wrong keys and wrong types become literally impossible to emit.

The jump that matters is from "parse a blob" to "the model can't emit a non-conforming blob in the first place." Let me explain why that jump is real and not marketing.

Why schema-constrained decoding beats regex-parsing a blob

Here's json_schema with strict mode on the OpenAI Chat Completions API:

const res = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: text }],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "ticket",
      strict: true, // turns on schema-constrained decoding
      schema: {
        type: "object",
        additionalProperties: false, // no surprise keys
        required: ["priority", "summary", "needs_human"],
        properties: {
          priority: { type: "string", enum: ["low", "med", "high"] },
          summary: { type: "string" },
          needs_human: { type: "boolean" },
        },
      },
    },
  },
});

Tool calling: when the model should act, not just answer

const tools = [{
  type: "function",
  function: {
    name: "suggest_actions",
    description: "Suggest 1–3 follow-up action chips alongside the reply.",
    parameters: {
      type: "object",
      additionalProperties: false,
      required: ["actions"],
      properties: {
        actions: {
          type: "array",
          maxItems: 3,
          items: {
            type: "string",
            // the model can ONLY pick from these ids:
            enum: ["book-call", "see-work", "ask-pricing", "resume"],
          },
        },
      },
    },
  },
}];

For the why-an-agent-routes-its-own-strategy angle, I dig into letting a model pick its retrieval path in metadata-filtered RAG — same instinct, a layer up.

When plain JSON mode is still the right tool

Validate anyway — Zod or Pydantic, every time

import { z } from "zod";

const Ticket = z.object({
  priority: z.enum(["low", "med", "high"]),
  summary: z.string().min(1),
  needs_human: z.boolean(),
});

const parsed = Ticket.safeParse(JSON.parse(res.choices[0].message.content!));
if (!parsed.success) {
  // feed parsed.error back to the model and retry once — don't 500
  throw new RetryableError(parsed.error);
}
return parsed.data; // fully typed, safe to use

Handle refusals, truncation, and the retry

Two failure modes survive even strict mode, and ignoring them is how "reliable" demos turn flaky in production:

Refusals. The model can decline rather than fill your schema. With Structured Outputs you get a typed refusal string on the message instead of garbage shoehorned into your shape — check it before you parse. Don't feed a refusal to JSON.parse.
Truncation. If you hit max_tokens mid-object, you get a syntactically broken JSON document. Check finish_reason === "length" and treat it as an error, not a parse bug — raise the token cap or shrink the task.

What I'd tell my past self

Constrain the output space; don't sanitize the output. An enum the model picks from beats a regex that scrubs what it wrote. Make bad states unrepresentable.
Use json_schema + strict: true as the default; drop to json_object JSON mode only when you need provider portability, as I did on the news desk.
Tool calling is for actions; Structured Outputs is for data. If the model's job is "choose and do," that's a tool with enum-constrained args.
Validate with Zod/Pydantic regardless, from a schema you defined once, and check refusal and finish_reason before you ever call JSON.parse.

Frequently asked questions

How do I force an LLM to return valid JSON?: Use schema-constrained decoding rather than prompting. OpenAI's response_format with json_schema and strict: true masks the token sampler at every step so the model can only emit output that conforms to your schema. Plain JSON mode (json_object) guarantees valid JSON but not a specific shape, so prefer json_schema when you need exact keys and types.
What is the difference between JSON mode and Structured Outputs?: JSON mode (json_object) guarantees the output is syntactically valid JSON but lets the model choose the keys, types, and structure. Structured Outputs (json_schema with strict: true) constrains decoding to a schema you define, so wrong keys, missing fields, and bad enum values become impossible to emit. Use Structured Outputs when shape matters; fall back to JSON mode only for provider portability.
When should I use OpenAI tool calling instead of structured outputs?: Use tool calling when the model's output should drive an action in your code, like rendering UI, querying a database, or triggering a workflow, and the model needs to choose which function to call. Use Structured Outputs when you just need data in a fixed shape returned to the caller. Both rely on the same constrained-decoding machinery and JSON-Schema validation.
Do I still need to validate LLM output if I use a strict schema?: Yes. Schema-constrained decoding makes malformed output unlikely, not impossible across every provider and model snapshot, and the parsed object still crosses a trust boundary into your code. Validate with Zod or Pydantic from a schema you define once, and check the refusal and finish_reason fields before calling JSON.parse.

/ continue reading

June 24, 2026 7 min

Metadata-filtered RAG: two-stage retrieval that stops returning irrelevant chunks

Metadata-filtered RAG fixes single-shot retrieval that returns junk on multi-topic corpora. How I built a metadata pre-filter, vector search, and LLM rerank pipeline.

Read

June 25, 2026 8 min

Model Context Protocol (MCP), explained for people who already build agents

Model Context Protocol explained for agent builders: what an MCP server is, tools vs resources vs prompts, a minimal TypeScript example, and MCP vs tool-calling.

Read

Back to all writing