For Best Results with LLMs, Use JSON Prompt Outputs

This is the fourth part of an ongoing series. See parts 1, 2, and 3.

AI Principle IV: Use Structured Prompt Outputs

There was a time, a long, long time ago, when LLM APIs had just come out and no one yet knew for sure how to properly interact with them. One of the top problems was extracting multiple outputs from a single prompt response. When LLMs didn’t consistently return JSON (and they failed often), you tried persuading the LLM to cooperate by using your best prompt engineering oratory.

Those were ancient times. Back then, we traveled on horseback and wrote prompts by candlelight, as electricity hadn’t yet been invented. Debugging prompts meant long nights spent squinting at parchment scrolls, hoping the model would return a list instead of a haiku. And if it failed, you had no choice but to sigh deeply, dip your quill in ink, and try again.

Ok, I made that last part up. But LLM APIs that couldn’t consistently return a JSON response were a real thing and caused loads of issues. It all began to change with structured outputs in November of 2023 - you could now use the OpenAI API to give you a formatted JSON. In 2024, OpenAI also added support for strict structured outputs, which fully guarantees a JSON return. Similar API enhancements have also been added by Anthropic and Google. The time for unstructured prompt outputs has passed, and we are never going back.

Benefits

Why is it better to use JSON-structured prompt outputs as opposed to other formats or inventing a custom format?

Reduced Error Rate

Modern LLMs are fine-tuned to output valid JSON when requested—it is rare for them to fail even with very complex responses. In addition, many platforms have software-level protections against incorrectly formatted outputs. For example, the OpenAI API throws an exception when a non-JSON is returned when in structured output strict mode.

If you use a custom format to return multiple output variables, you will not benefit from this fine-tuning, and the error rate will be much higher. Time will be spent re-engineering the prompt and adding retries.

Decoupled Prompts and Code

With a JSON output, it's trivial to add another output field, and doing so shouldn’t break your existing code. This decouples adding fields to the prompt from changes to the code processing logic. Decoupling can save you time and effort, particularly in cases where prompts are loaded from outside Git; see Principle II: Load LLM Prompts Safely (If You Really Have To).

Simplified System

Is there a practical reason to use an output format without built-in platform support? It would be easier for both you and the subsequent code contributors to format responses using JSON. Don’t reinvent the wheel unless you have to.

When NOT to Use Structured Output

Single Field Output

If your prompt outputs a single field in response, there are no benefits to outputting a JSON. Or are there?

Single-variable responses today may become complex responses tomorrow. After spending hours turning one field output prompt into many field output prompts, I now use JSON by default even when only a single field is returned. This saves time later while adding minimal extra complexity upfront.

Even when the program logic doesn’t need multiple outputs, there are prompt engineering and debugging benefits to adding additional fields. Adding a field that provides an explanation for a response (or cites a source in the documentation) can often significantly improve prompt performance (1). It can also be logged as an explanation for the model’s decisions. Having the response be JSON from the start makes adding such a field far easier.

So even if your prompt has a single output variable, consider JSON format as an option.

Streaming Response

For applications in which latency is critical, streaming LLM endpoints are often used. These allow for parts of the response to be acted on before the entire response is received. This pattern doesn’t work well with JSON, so you should use a simple, stream-friendly format instead.

For example, if your prompt decides on the action taken by a video game character and the words that the character says, you can encode it as “ACTION|SPEECH_TO_READ” and then stream the response with a streaming API, such as OpenAI Streaming API. This will give you far better latency.

Example Output:

WAVE_AT_HERO\|Hello, Adventurer! Welcome to my shop.

As soon as the action is received, the character begins waving, and text is output as it streams in.

JSON lines and other stream-friendly formats can also be used effectively.

Conclusion

Don’t reject the benefits of civilization - use JSON-structured prompt outputs. There are hardly any downsides and it will make your life much easier as LLMs are heavily optimized to return valid JSON responses. Consider using a JSON output even if the extracted data is currently a single field. For streaming endpoints, use JSON lines or a simple custom format.

If you’ve enjoyed this post, subscribe to the series for more.