OpenAI recently introduced a powerful feature called Predicted Outputs, which quietly entered the scene without much attention from the technical media—an oversight that deserves correction. I noticed that they mentioned this on X in their developer account, but it didn't get much publicity. I decided to draw attention to it because it's a really cool and useful feature.
Predicted Outputs significantly reduce latency for model responses, especially when much of the output is known ahead of time. This feature is particularly beneficial for applications that involve regenerating text documents or code files with minor modifications.
Predicted Outputs allow developers to speed up API responses from Chat Completions when the expected output is largely predictable. By providing a prediction of the expected response using the prediction
parameter in Chat Completions, the model can generate the required output more efficiently. This functionality is currently available with the latest gpt-4o and gpt-4o-mini models.
When you have a response where most of the content is already known, you can supply that expected content as a prediction to the model. The model then uses this prediction to expedite the generation of the response, reducing latency and improving performance.
Imagine you have a JSON configuration file that needs a minor update. Here's an example of such a file:
{
"appName": "MyApp",
"version": "1.0.0",
"settings": {
"enableFeatureX": false,
"maxUsers": 100
}
}
Suppose you want to update "enableFeatureX"
to true
. Instead of generating the entire file from scratch, you can provide the original file as a prediction and instruct the model to make the necessary change.
import OpenAI from "openai";
const config = `
{
"appName": "MyApp",
"version": "1.0.0",
"settings": {
"enableFeatureX": false,
"maxUsers": 100
}
}
`.trim();
const openai = new OpenAI();
const updatePrompt = `
Change "enableFeatureX" to true in the following JSON configuration. Respond only with the updated JSON, without any additional text.
`;
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "user", content: updatePrompt },
{ role: "user", content: config }
],
prediction: {
type: "content",
content: config
}
});
// Output the updated configuration
console.log(completion.choices[0].message.content);
In this example, the model quickly generates the updated configuration file, leveraging the prediction to minimize response time.
For applications that require streaming responses, Predicted Outputs offer even greater latency reductions. Here's how you can implement the previous example using streaming:
import OpenAI from "openai";
const config = `...`; // Original JSON configuration
const openai = new OpenAI();
const updatePrompt = `...`; // Prompt as before
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [ /* ... */ ],
prediction: {
type: "content",
content: config
},
stream: true
});
for await (const chunk of completion) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
At the time of writing, there are no alternatives to this solution from competitors. OpenAI's Predicted Outputs feature appears to be a unique offering that addresses the specific need for latency reduction when regenerating known content with minor modifications.
The cool thing is that to start using it, you need almost nothing. Just take it and use it by adding just one new parameter to the API request. This makes it very easy for developers to implement this feature in their existing applications.
While Predicted Outputs offer significant advantages, there are important considerations:
rejected_prediction_tokens
in the usage data to manage costs.n
(values higher than 1)logprobs
presence_penalty
(values greater than 0)frequency_penalty
(values greater than 0)max_completion_tokens
tools
(function calling is not supported)
OpenAI's Predicted Outputs is a groundbreaking feature that addresses a common challenge in AI applications: reducing latency when the response is largely predictable. By allowing developers to supply expected outputs, it accelerates response times and enhances user experience.
In my personal opinion, OpenAI models are not as strong as Anthropic's models. However, OpenAI produces many cool and really necessary solutions in other areas. Features like Predicted Outputs set OpenAI apart from other AI providers, offering unique solutions that meet specific needs in the developer community.