Welcome back toĀ
Intro & Setup Your First AI Prompt Streaming Responses How Does AI Work Prompt Engineering AI-Generated Images Security & Reliability Deploying
InĀ
A better experience, as youāll know if youāve used any AI chat tools, is to respond as soon as each bit of text is generated. It becomes a sort of teletype effect.
Thatās what weāre going to build today usingĀ
https://www.youtube.com/watch?v=GkyHBwUA0EQ&embedable=true
Prerequisites
Before we get into streams, we need to explore something with a Qwik quirk related to HTTP requests.
If we examine the current POST request being sent by the form, we can see that the returned payload isnāt just the plain text we returned from our action handler. Instead, itās this sort ofĀ
This is the result of how theĀ
So whileĀ routeAction$
Ā and theĀ Form
Ā components are super handy, weāll have to do something else.
To their credit, the Qwik team does provide aĀ server$
Ā function andĀ
Refactor Server Logic
It sucks that we canāt use route actions because theyāre great. So what can we use?
Qwik City offers a few options. The best I found isĀ
Middleware is essentially a set of functions that we can inject at various points within the request lifecycle of our route handler. We can define them by exporting named constants for the hooks we want to target (onRequest
,Ā onGet
,Ā onPost
,Ā onPut
,Ā onDelete
).
So, instead of relying on a route action, we can use a middleware that hooks into any POST request by exporting anĀ onPost
Ā middleware. In order to support streaming, weāll want to return a standardĀ requestEvent.send()
Ā method.
Hereās a basic (non-streaming) example:
/** @type {import('@builder.io/qwik-city').RequestHandler} */
export const onPost = (requestEvent) => {
requestEvent.send(new Response('Hello Squirrel!'))
}
Before we tackle streaming, letās get the same functionality from the old route action implemented with middleware. We can copy most of the code into theĀ onPost
Ā middleware, but we wonāt have access toĀ formData
.
Fortunately, we can recreate that data from theĀ requestEvent.parseBody()
Ā method. Weāll also want to useĀ requestEvent.send()
Ā to respond with the OpenAI data instead of aĀ return
Ā statement.
/** @type {import('@builder.io/qwik-city').RequestHandler} */
export const onPost = async (requestEvent) => {
const OPENAI_API_KEY = requestEvent.env.get('OPENAI_API_KEY')
const formData = await requestEvent.parseBody()
const prompt = formData.prompt
const body = {
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: prompt }]
}
const response = await fetch('https://api.openai.com/v1/chat/completions', {
// ... fetch options
})
const data = await response.json()
const responseBody = data.choices[0].message.content
requestEvent.send(new Response(responseBody))
}
Refactor Client Logic
Replacing the route actions has the unfortunate side effect of meaning we also canāt use theĀ <Form>
Ā component anymore. Weāll have to use a regularĀ <form>
Ā element and recreate all the benefits we had before, including sending HTTP requests withĀ
Letās refactor our client side to support those features again.
We can break these requirements down to needing two things, a JavaScript solution for submitting forms and a reactive state for managing loading states and results.
Iāve covered submitting HTML forms with JavaScript in depth several times in the past:
Make Beautifully Resilient Apps With Progressive Enhancement How to Upload Files with JavaScript Building Super Powered HTML Forms with JavaScript
So today, Iāll just share the snippet, which I put inside aĀ utils.js
Ā file at the root of my project.
ThisĀ jsFormSubmit
Ā function accepts anĀ HTMLFormElement
Ā then constructs aĀ fetch
Ā request based on the form attributes and returns the resultingĀ
/**
* @param {HTMLFormElement} form
*/
export function jsFormSubmit(form) {
const url = new URL(form.action)
const formData = new FormData(form)
const searchParameters = new URLSearchParams(formData)
/** @type {Parameters<typeof fetch>[1]} */
const fetchOptions = {
method: form.method
}
if (form.method.toLowerCase() === 'post') {
fetchOptions.body = form.enctype === 'multipart/form-data' ? formData : searchParameters
} else {
url.search = searchParameters
}
return fetch(url, fetchOptions)
}
This generic function can be used to submit any HTML form, so itās handy to use in aĀ submit
Ā eventĀ handler. Sweet!
As for the reactive data, Qwik provides two options,Ā useStore
Ā andĀ useSignal
. I preferĀ useStore
, which allows us to create an object whose properties areĀ
We can useĀ useStore
Ā to create a āstateā object in our component to track the loading state of the HTTP request as well as the text response.
import { $, component$, useStore } from "@builder.io/qwik";
// other setup logic
export default component$(() => {
const state = useStore({
isLoading: false,
text: '',
})
// other component logic
})
Next, we can update the template. Since we can no longer use theĀ action
Ā object we had before, we can replace references fromĀ action.isRunning
Ā andĀ action.value
Ā toĀ state.isLoading
Ā andĀ state.text
, respectively (donāt ask me why I changed the property names š¤·āāļø). Iāll also add a āsubmitā event handler to the form calledĀ handleSbumit
, which weāll look at shortly.
<main>
<form
method="post"
preventdefault:submit
onSubmit$={handleSubmit}
>
<div>
<label for="prompt">Prompt</label>
<textarea name="prompt" id="prompt">
Tell me a joke
</textarea>
</div>
<button type="submit" aria-disabled={state.isLoading}>
{state.isLoading ? 'One sec...' : 'Tell me'}
</button>
</form>
{state.text && (
<article>
<p>{state.text}</p>
</article>
)}
</main>
Note that theĀ <form>
Ā does not explicitly provide anĀ action
Ā attribute. By default, an HTML form will submit data to the current URL, so we only need to set theĀ method
Ā to POST and submit this form to trigger theĀ onPost
Ā middleware we defined earlier.
Now, the last step to get this refactor working is definingĀ handleSubmit
. Just like we did inĀ $
Ā function.
Inside the event handler, weāll want to clear out any previous data fromĀ state.text
, setĀ state.isLoading
Ā toĀ true
, then pass the formās DOM node to our fancyĀ jsFormSubmit
Ā function.
This should submit the HTTP request for us. Once it comes back, we can updateĀ state.text
Ā with the response body, and returnĀ state.isLoading
Ā toĀ false
.
const handleSubmit = $(async (event) => {
state.text = ''
state.isLoading = true
/** @type {HTMLFormElement} */
const form = event.target
const response = await jsFormSubmit(form)
state.text = await response.text()
state.isLoading = false
})
OK! We should now have a client-side form that uses JavaScript to submit an HTTP request to the server while tracking the loading and response states, and updating the UI accordingly.
That was a lot of work to get the same solution we had before but with fewer features. BUT the key benefit is we now have direct access to the platform primitives we need to support streaming.
Enable Streaming on the Server
Before we start streaming responses from OpenAI, I think itās helpful to start with a very basic example to get a better grasp of streams. Streams allow us to send small chunks of data over time.
So as an example, letās print out some iconic David Bowie lyrics in tempo with the song, ā
When we construct our Response object, instead of passing plain text, weāll want to pass a stream. Weāll create the stream shortly, but hereās the idea:
/** @type {import('@builder.io/qwik-city').RequestHandler} */
export const onPost = (requestEvent) => {
requestEvent.send(new Response(stream))
}
Weāll create a very rudimentaryĀ ReadableStream
Ā using theĀ ReadableStream
Ā constructorĀ and pass it anĀ start
Ā method thatās called when the stream is constructed.
The start method is responsible for the steamās logic and has access to the streamĀ controller
, which is used to send data and close the stream.
const stream = new ReadableStream({
start(controller) {
// Stream logic goes here
}
})
OK, letās plan out that logic. Weāll have an array of song lyrics and a function to āsingā them (pass them to the stream). TheĀ sing
Ā function will take the first item in the array and pass that to the stream using theĀ controller.enqueue()
Ā method.
If itās the last lyric in the list, we can close the stream withĀ controller.close()
. Otherwise, theĀ sing
Ā method can call itself again after a short pause.
const stream = new ReadableStream({
start(controller) {
const lyrics = ['Ground', ' control', ' to major', ' Tom.']
function sing() {
const lyric = lyrics.shift()
controller.enqueue(lyric)
if (lyrics.length < 1) {
controller.close()
} else {
setTimeout(sing, 1000)
}
}
sing()
}
})
So each second, for four seconds, this stream will send out the lyrics āGround control to major Tom.ā Slick!
Because this stream will be used in the body of the Response, the connection will remain open for four seconds until the response completes. But the frontend will have access to each chunk of data as it arrives, rather than waiting the full four seconds.
This doesnāt speed up the total response time (in some cases, streams can increase response times), but it does allow for a faster-perceivedĀ response, and that makes a better user experience.
Hereās what my code looks like:
/** @type {import('@builder.io/qwik-city').RequestHandler} */
export const onPost: RequestHandler = async (requestEvent) => {
const stream = new ReadableStream({
start(controller) {
const lyrics = ['Ground', ' control', ' to major', ' Tom.']
function sing() {
const lyric = lyrics.shift()
controller.enqueue(lyric)
if (lyrics.length < 1) {
controller.close()
} else {
setTimeout(sing, 1000)
}
}
sing()
}
})
requestEvent.send(new Response(stream))
}
Unfortunately, as it stands right now, the client will still be waiting four seconds before seeing the entire response, and thatās because we werenāt expecting a streamed response.
Letās fix that.
Enable Streaming on the Client
Even when dealing with streams, the default browser behavior when receiving a response is to wait for it to complete. In order to get the behavior we want, weāll need to use client-side JavaScript to make the request and process the streaming body of the response.
Weāve already tackled that first part inside ourĀ handleSubmit
Ā function. Letās start processing that response body.
We can access theĀ ReadableStream
Ā from the response bodyāsĀ getReader()
Ā method. This stream will have its ownĀ read()
Ā method that we can use to access the next chunk of data as well as the information if the response is done streaming or not.
The only āgotchaā is that the data in each chunk doesnāt come in as text; it comes in as aĀ Uint8Array
, which is āan array of 8-bit unsigned integers.ā Itās basically the representation of the binary data, and you donāt really need to understand any deeper than that unless you want to sound very smart at a party (or boring).
The important thing to understand is that on their own, these data chunks arenāt very useful. To get something weĀ canĀ use, weāll need to decode each chunk of data using aĀ TextDecoder
.
Ok, thatās a lot of theory. Letās break down the logic and then look at some code.
When we get the response back, we need to:
-
Grab the reader from the response body usingĀ
response.body.getReader()
-
Setup a decoder usingĀ
TextDecoder
Ā and a variable to track the streaming status. -
Process each chunk until the stream is complete, with aĀ
while
Ā loop that does this:-
Grab the next chunkās data and stream status.
-
Decode the data and use it to update our appāsĀ
state.text
. -
Update the streaming status variable, terminating the loop when complete.
-
-
Update the loading state of the app by settingĀ
state.isLoading
Ā toĀfalse
.
The newĀ handleSubmit
Ā function should look something like this:
const handleSubmit = $(async (event) => {
state.text = ''
state.isLoading = true
/** @type {HTMLFormElement} */
const form = event.target
const response = await jsFormSubmit(form)
// Parse streaming body
const reader = response.body.getReader()
const decoder = new TextDecoder()
let isStillStreaming = true
while(isStillStreaming) {
const {value, done} = await reader.read()
const chunkValue = decoder.decode(value)
state.text += chunkValue
isStillStreaming = !done
}
state.isLoading = false
})
Now, when I submit the form, I see something like:
āGround
control
to major
Tom.ā
Hell yeah!!!
OK, most of the work is down. Now, we just need to replace our demo stream with the OpenAI response.
Stream OpenAI Response
Looking back at our original implementation, the first thing we need to do is modify the request to OpenAI to let them know that we would like a streaming response. We can do that by setting theĀ stream
Ā property in theĀ fetch
Ā payload toĀ true
.
const body = {
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: prompt }],
stream: true
}
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'post',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${OPENAI_API_KEY}`,
},
body: JSON.stringify(body)
})
UPDATE 2023/11/15:Ā I used fetch and custom streams because, at the time of writing, theĀ openai
Ā module on NPM did not properly support streaming responses.
That issue has been fixed, and I think a better solution would be to use that module and pipe their data through aĀ TransformStream
Ā to send it to the client. That version is not reflected here.
Next, we could pipe the response from OpenAI directly to the client, but we might not want to do that. The data they send doesnāt really align with what we want to send to the client because it looks like this (two chunks, one with data, and one representing the end of the stream):
data: {"id":"chatcmpl-4bJZRnslkje3289REHFEH9ej2","object":"chat.completion.chunk","created":1690319476,"model":"gpt-3.5-turbo-0613","choiced":[{"index":0,"delta":{"content":"Because"},"finish_reason":"stop"}]}
data: [DONE]
Instead, what weāll do is create our own stream, similar to the David Bowie lyrics, that will do some setup, enqueue chunks of data into the stream, and close the stream. Letās start with an outline:
const stream = new ReadableStream({
async start(controller) {
// Any setup before streaming
// Send chunks of data
// Close stream
}
})
Since weāre dealing with a streaming fetch response from OpenAI, a lot of the work we need to do here can actually be copied from the client-side stream handling. This part should look familiar:
const reader = response.body.getReader()
const decoder = new TextDecoder()
let isStillStreaming = true
while(isStillStreaming) {
const {value, done} = await reader.read()
const chunkValue = decoder.decode(value)
// Here's where things will be different
isStillStreaming = !done
}
This snippet was taken almost directly from the frontend stream processing example. The only difference is that we need to treat the data coming from OpenAI slightly differently. As we say, the chunks of data they send up will look something likeādata:Ā [JSON dataĀ orĀ done]
ā.
Another gotcha is that every once in a while, theyāll actually slip in TWO of these data strings in a single streaming chunk. So, hereās what I came up with for processing the data.
-
Create aĀ
Regular Expression Ā to grab the rest of the string after ādata:Ā
ā. -
For the unlikely event there are more than one data strings, use a while loop to process every match in the string.
-
If the current matches the closing condition (ā
[DONE]
ā) close the stream. -
Otherwise, parse the data as JSON and enqueue the first piece of text from the list of options (
json.choices[0].delta.content
). Fall back to an empty string if none is present. -
Lastly, in order to move to the next match, if there is one, we can useĀ
RegExp.exec()
.
The logic is quite abstract without looking at code, so hereās what the whole stream looks like now:
const stream = new ReadableStream({
async start(controller) {
// Do work before streaming
const reader = response.body.getReader()
const decoder = new TextDecoder()
let isStillStreaming = true
while(isStillStreaming) {
const {value, done} = await reader.read()
const chunkValue = decoder.decode(value)
/**
* Captures any string after the text `data: `
* @see https://regex101.com/r/R4QgmZ/1
*/
const regex = /data:\s*(.*)/g
let match = regex.exec(chunkValue)
while (match !== null) {
const payload = match[1]
// Close stream
if (payload === '[DONE]') {
controller.close()
break
} else {
try {
const json = JSON.parse(payload)
const text = json.choices[0].delta.content || ''
// Send chunk of data
controller.enqueue(text)
match = regex.exec(chunkValue)
} catch (error) {
const nextChunk = await reader.read()
const nextChunkValue = decoder.decode(nextChunk.value)
match = regex.exec(chunkValue + nextChunkValue)
}
}
}
isStillStreaming = !done
}
}
})
UPDATE 2023/11/15:Ā I discovered that OpenAI API sometimes returns the JSON payload across two streams. So the solution is to use aĀ try/catch
Ā block around theĀ JSON.parse
,Ā and in the case that it fails, reassign theĀ match
Ā variable to the current chunk value plus the next chunk value. The code above has the updated snippet.
Review
That should be everything we need to get streaming working. Hopefully, it all makes sense, and you got it working on your end.
I think itās a good idea to review the flow to make sure weāve got it:
-
The user submits the form, which gets intercepted and sent with JavaScript. This is necessary to process the stream when it returns.
-
The request is received by the action handler which forwards the data to the OpenAI API along with the setting to return the response as a stream.
-
The OpenAI response will be sent back as a stream of chunks, some of which contain JSON and the last one being ā
[DONE]
ā. -
Instead of passing the stream to the action response, we create a new stream to use in the response.
-
Inside this stream, we process each chunk of data from the OpenAI response and convert it to something more useful before enqueuing it for the action response stream.
-
When the OpenAI stream closes, we also close our action stream.
-
The JavaScript handler on the client side will also process each chunk of data as it comes in and update the UI accordingly.
Conclusion
The app is working. Itās pretty cool. We covered a lot of interesting things today. Streams are very powerful, but also challenging, and especially when working within Qwik, there are a couple of little gotchas.
But because we focused on low-level fundamentals, these concepts should apply across any framework.
As long as you have access to the platform and primitives like streams, requests, and response objects then this should work. Thatās the beauty of fundamentals.
I think we got a pretty decent application going now. The only problem is right now weāre using a generic text input and asking users to fill in the entire prompt themselves. In fact, they can put in whatever they want. Weāll want to fix that in a future post, but the next post is going to step away from code and focus on understanding how the AI tools actually work.
I hope youāve been enjoying this series and come back for the rest of it.
Intro & Setup Your First AI Prompt Streaming Responses How Does AI Work Prompt Engineering AI-Generated Images Security & Reliability Deploying
Thank you so much for reading. If you liked this article, and want to support me, the best ways to do so are toĀ
First published here.