627 reads

How I Built a Chatbot for Studying Foreign Languages Using Coze

by Kevin StubbsNovember 7th, 2024

Too Long; Didn't Read

This article guides you through building a language-learning chatbot using Coze AI for Unutma. It covers everything from setting up Coze AI with Next.js to creating a word bank generator, integrating it into your site, and considering the costs and scalability for production use.

featured image - How I Built a Chatbot for Studying Foreign Languages Using Coze

Coze AI is a no-code platform for building powerful chatbots that can handle complex workflows, multiple agents working together, and a lot more.

I'm a seasoned developer running a full-stack development agency and also launching a language-learning startup for polyglots called Unutma (https://unutma.app - it's in alpha, be gentle!). I haven't built an AI chatbot before, but we’ll do it together, from scratch, for Unutma.

It was easier to build than I expected. So much so that I’ll start from square one, and in this same article cover creating the bot, publishing it, integrating it into a Nextjs/React site, and even discussing some high-level thoughts on running this in a production startup!

What exactly are we building?

A chatbot to handle two main flows:

Generate a list of new words, and test the user.
Extract a lexicon (wordbank) from any text the user provides, and test the user on the words.

Unutma's first iteration is highly focused on L2 language acquisition, so our chatbot will also be focused on vocabulary. If the user asks about other things, that's okay, but we aren't going to program our agent to take special care of those cases.

Dive in with both feet first

When I'm picking up a new library, framework, etc. I find the quickest way to grok it is to just jump in and start experimenting because you will be able to quickly identify similar patterns to other technologies you may have worked with in the past.

Coze has four out-of-the-box templates. (explore them here)

In the case of language learning, it seemed like the tutor template would be the closest fit.

By process of elimination:

Article generation and email generation both looked very transactional (input => output and repeat)
The kid’s story is a single agent, and can probably make things up at any point.

The tutor is multi-modal and educational, so it was the most promising to start digging into. If you want to try to follow along, then click on the IELTS template and then click “Duplicate”

I’ve adjusted the template to look like this:

Make sure to change the model for each of your agents!

The default model could drain your credits in literally just two messages. You can change your model settings for each agent by clicking on the three dots.

I recommend choosing GPT-3.5 Turbo, just because on a paid Coze plan you will get unlimited tokens. This is ideal when just starting out - you can always beef up the model size in the future when you’re sure how much brainpower is really needed for each of your agents and how many tokens it will cost you.

Persona & Prompt for Our Bot

Role:

You are a foreign language expert linguist and tutor who can help users acquire and retain foreign language vocabulary. You don't use technical linguistic terms unless the user asks for it. Assume your users have between A0-B1 level knowledge of the language they are studying.

Limitations:
Only focus on content related to studying a foreign language. - The output content cannot include the following symbols { } < >.

Our entry point makes sure to get the user's target language (what language they want to study) before shepherding them into one of our two flows: either generating a list of words to learn out of thin air or generating the word bank from user-provided text.

After this, the agent starts a quiz consisting of three randomly chosen words from the list, which it then grades you on. It can handle the answers not being given exactly "Is it a lake?" is just as correct as answering "lake"

I love listening to Turkish pop music, so I wanted to learn the words in a Tarkan's songŞımarık.

The agent provides the same quiz experience to the user after giving this list too.

Prompts

Starting agent

First, you should what language they are studying, then ask the user to choose if they want help with how to use a certain word, learn new vocabulary, or generate a word bank.

## Limitations
- Ask user what language they want to study before answering user's question.
- Answer nothing except asking the user to choose which task to start.
- Jump to other agent, if user choose what they want to do.

New vocab agent

# Role

You are a language tutoring expert. You don't use technical linguistic terms unless the user asks for it. For users with weak foundations, you are good at guiding users through step-by-step in-depth questioning.

""""""""""""""""""""""""""""""""""""""""""""""""""""""""

## Skill 1: Generate word bank

### Step 1: Get the input text from the user

- At the beginning of the conversation, you ask for the user to pick a theme for new vocabulary in {User's target Language}.
- Generate a unique list of ten (10) vocabulary words in {User's target Language}. Any words which are conjugated or otherwise not in their dictionary form should be changed to their dictionary form. Plural forms should be changed to singular. Verbs should be converted to their infinitive form.
- Provide the definition for each word in the list.
- Create three example sentences that use the newly generated words. The sentences should not be too long or grammatically difficult. Use as many of the words that you created as possible, but it's okay not to use 100% of them.

#### Output in the following format:

If these words are too easy, too hard, or you already know them, tell me and I would be happy to generate a new set!
 - **Each word and its definition on its own line**
- **Each example sentence and its translation in English on its own line**

### Step 2: Quiz the user on definitions

1. Choose a word from the text which the user provided you, and ask the user for the dictionary form of it.

#### Output in the following format:

The word is "<Chosen word>". What is the definition of this word?

#### Question asking rules:

1. The word must exist in the list of words that was generated in step 1. If the user asks to clarify the definition, what type of word it is (such as verb, noun, etc.), or for an example of how it can be used in a sentence provide that to them and then ask them again for the same word.
2. If the user says that the words are too easy, then generate a new list that's a little more challenging. If the words are too hard, then generate a new list that's a little less challenging. If the user already knows the words, then generate a new list using different words but the same level of difficulty.
3. After the user provides an answer, tell them if they correctly provided the definition.
4. Perform this step 3 times regardless of whether they answered correctly or incorrectly.

### Step 3: Score & Congratulate

1. Tell the user how many words they got right out of the total number of words they were asked.
2. Ask the user if they want to redo the quiz. If they want to redo, then go back to step 2. Otherwise, thank them.

""""""""""""""""""""""""""""""""""""""""""""""""""""""""

## Restrictions

1. The user must answer at least one question from step 2.

Word bank agent

# Role

You are a language tutoring expert. You don't use technical linguistic terms unless the user asks for it. For users with weak foundations, you are good at guiding users through step-by-step in-depth questioning.

""""""""""""""""""""""""""""""""""""""""""""""""""""""""

## Skill 1: Generate word bank

### Step 1: Get the input text from the user

- At the beginning of the conversation, you ask for text in the {User's target Language}.
- Convert the text that the user gives you into a unique list of words that you found in it. Any words which are conjugated or otherwise not in their dictionary form should be changed to their dictionary form. Plural forms should be changed to singular. Verbs should be converted to their infinitive form.
- Provide the definition for each word in the list.

#### Output in the following format:

 - **Each word and its definition on its own line**

### Step 2: Quiz the user on definitions

1. Choose a word from the text which the user provided you, and ask the user for the dictionary form of it.

#### Output in the following format:

The word is "<Chosen word>". What is the dictionary, singular form of this?

#### Question asking rules:

1. The word must exist in the text the user provided to you in step 1. If the user asks to clarify the definition, what type of word it is (such as verb, noun, etc.), or for an example of how it can be used in a sentence provide that to them and then ask them again for the same word.
2. After the user provides an answer, tell them if they correctly provided the dictionary, singular form, of the word you asked them. It should not be conjugated in any way. If it is a verb, it should be in the infinitive form.
3. Perform this step 3 times regardless of whether they answered correctly or incorrectly.

### Step 3: Score & Congratulate

1. Tell the user how many words they got right out of the total number of words they were asked.
2. Ask the user if they want to redo the quiz. If they want to redo, then go back to step 2. Otherwise, thank them.

""""""""""""""""""""""""""""""""""""""""""""""""""""""""

## Restrictions

1. The user must answer at least one question from step 2.

Debugging

You will likely run into issues with the chatbot failing to execute, or jumping to an unexpected agent. So let's quickly take a look at some of the debugging information available to us in Coze's UI.

Clearly, some instructions were miscommunicated and were applied twice, but I couldn't guess why the session was terminated rather than just having a message from the bot like "Sorry, can't do that". Clicking on the wrench symbol opens the debugging panel, which gives this sort of info.

The top part of the panel had technically interesting, but unhelpful, information.

The bottom part is a little more helpful - it looks like it was directly caused by our message (not something like auth expired, out of credits, etc.) and that perhaps it jumped to a non-existent agent.

You are also able to see the debug information for other messages too, not just when something goes wrong. Ultimately, it's not terribly helpful, as it just provides the inputs (as it was understood by the chatbot) and the output. I would imagine this to be useful when debugging conversations by other users, but not when you are debugging your own conversation (since you are already aware of the inputs, outputs, etc.) I found too much of the debugging interface to be dedicated to LLM performance, whereas I wanted to focus on debugging LLM understanding and flow control decisions. I wanted to see something like "Given this input X, agent Y decided that it fit Skill #1". AI transparency in general is still a big hairy problem for the industry as a whole, so it's not particularly surprising now to see it perfectly solved here either.

With all that being said, I did accidentally find the information I needed. It looks like this is disabled, but if you click on it, you will see the jump breakdown.

After clicking the dropdown, I got more useful information indicating that the problem lie within the word bank agent. In other debugging cases, I have seen a short-circuited infinite redirect loop (agents passing the user back and forth to each other in a loop before giving up)

Integrating your chatbot into an app

Adding it into Unutma (React/Nextjs)

I decided to integrate via the Web SDK because it was a far faster path than via the API. Some of the setup does happen to be shared between the two, so if you start with the Web SDK and later decide to switch to the API, the upgrade path will feel pretty natural.

Following the install docs got me an embed code with all of the defaults turned on. However, I wanted to be able to control when the chatbot opens & closes programmatically. All of the configuration props are on the same docs page.

Before we see the code, let's see how it looks integrated on Unutma.

To integrate this, put this script in any layout, page, or component (only one is needed - put it in the layout for example, if it should appear on many pages).

<Script src='https://sf-cdn.coze.com/obj/unpkg-va/flow-platform/chat-app-sdk/1.0.0-beta.4/libs/oversea/index.js'></Script>

In the component that will control the chatbot, you can use this React Hook I wrote:

 const cozeRef = useCozeAI();

Which can be controlled like this. You can see I got a little bit lazy with the Typescript typing, but for production, you would want to handle the case where Coze hasn't finished loading yet. Maybe by skeleton loading these buttons, showing an error toast if they clicked on it when it wasn't ready, etc. I’ve hooked this functionality up to the “learn with magic” button.

  <div
    className="btn"
    onClick={() => {
      cozeRef.current!.showChatBot();
    }}
  >
             Display chat window
  </div>
  <br />
  <div
    className="btn"
    onClick={() => {
      cozeRef.current!.hideChatBot();
    }}
  >
             Hide chat window
  </div>

Finally, here is the React file useCozeAI.ts obviously please put your bot Id there.

import { useEffect, useRef } from "react";

const loadCozeComponent = async () => {
  if (typeof window === "undefined") return;

  if (!(window as any).CozeWebSDK) {
    return new Promise((resolve) => {
      setTimeout(() => loadCozeComponent().then(resolve), 250);
    });
  }

  return new (window as any).CozeWebSDK.WebChatClient({
    config: {
      bot_id: "PUT YOUR BOT ID HERE - IT WILL BE NUMERICAL",
    },

    componentProps: {
      title: "The Best Agent, Ever",
    },

    ui: {
      base: {
        icon: "https://lf-coze-web-cdn.coze.cn/obj/coze-web-cn/obric/coze/favicon.1970.png",
        layout: "pc",
        zIndex: 1000,
      },
      asstBtn: {
        isNeed: false,
      },
      footer: {
        isShow: true,
        expressionText: `Made just for you!`,
      },
    },
  });
};

export const useCozeAI = () => {
  const cozeRef = useRef<any>();

  useEffect(() => {
    loadCozeComponent().then(
      (newCozeClient: any) => (cozeRef.current = newCozeClient)
    );
  }, []);

  return cozeRef;
};

Adoption Challenges

Tokens vs Credits

Tokens for API and Web SDK usage can be purchased at https://www.coze.com/open/token I bought 200,000 credits for $2, for testing. You can buy smaller amounts than the presets by clicking on the customized card in the bottom right.

Access Token - rotate every 30 days?

The personal access token (PAT) necessary to embed the agent via the Web SDK has a maximum expiration time of 30 days. So to run this in production, it may be necessary to have a plan in place to rotate this once per month - and it seems that this might have to be a manual operation.

This token is not used when embedding the chat agent using the Web SDK, so perhaps it is automatically rotated behind the scenes. I didn't see it addressed in the docs.

Limited customizability of the chat agent

You either embed the agent on your site where it will appear as a sidebar, or you custom code all of the API requests, UI, and interaction loops by yourself. I strongly feel that the Web SDK embed should go a step further and let you insert it into arbitrary HTML elements. I could foresee a great use case where the user might go through multiple steps in a workflow, alternating between a chat interface and documentation. But to achieve that today, you would have to implement a fair number of the APIs, not to mention the UI for it. Engineering teams unwilling to invest so much into that must either fit the chatbot floating sidebar into their UX or leave it out entirely.

Country restrictions? A 451 HTTP code in the wild

It has been reported to me that usage in certain countries could be blocked with an error code such as

{"code":4402,"msg":"services not available in your country or region","detail":{"logid":"202411071053471FEAFA213AA88581FA09"}}

Without any further explanation, and leaving only a "try again later" message in the chat interface. The regions that are supported need to be documented, and there needs to be a programmatic way for developers to understand when something has gone wrong so that they can gracefully handle the issue within the app. I would consider the Web SDK to be NOT production ready for this alone for applications that are thinking of integrating it for a non-US user base.

Lack of guidance for prompt writing in markdown

Without Coze's IELTS template to start from, I probably wouldn't have learned that agent prompts can be written in a complex fashion with markdown. With markdown, we can clearly define different skills, output formats, constraints, etc. but I found myself oftentimes feeling like "I have no idea how it's going to understand this, but here goes nothing!" and more often than not, the LLM magically performed my freeform instructions. That's really amazing, but I still think some documentation into common instruction patterns, dos and don’ts, etc. would help Coze developers catch up to the current prompt writing meta much faster than trial and error.

Future steps - you have an agent, now what?

If you are building an agent as a toy, portfolio piece, or something else "not for prod" then I think you're already done and can move onto the next thing. But if you're interested in making this something really woven into your production app, there are some more things we must consider.

Controlling costs

Your agents on Coze can pass the costs onto the users since they have to be signed in with their Coze accounts. But when you have embedded it as an API or with the Web SDK, YOU are footing the bill. So I would be thinking about the following:

Gate access to the model to only users.
- Users should get a limited # of credits per day (or month)
- Free users should be limited to agents that use the cheapest models. If your agent requires o1-preview ($15 per 1M input tokens and $60 per 1M output tokens) and you let users use it for free.. oh boy, I do hope you didn't turn on auto-recharge! (If you do use auto-recharge, please take the precaution of setting its advanced settings such as daily auto-recharge cap)
If you are using the API, consider adding a caching layer. Caching concepts are mid, but semantic caching suddenly makes it super interesting, because you aren't caching the user's exact input, but instead caching the essence of what they're asking. So different chat inputs could resolve to the same "essence". This Redis article about it is a good place to start, if you're interested.
Think about how the costs of the agent usage fit into your business model.
- Does it make sense to directly pass the costs onto users (like via credits)
- Will you eat the cost and make up for it in the subscription or other purchases (all you can eat buffet model)
- Or is your agent producing economic value and you can upcharge access to it on a multiple? For example, an agent that generates a professional business plan, produces avatar images, etc. All of the examples that come to mind for me are where the business model IS access to the agent, whereas the above points are more natural ways to think about costs when the agent is a part of the product, rather than the entire product itself.

Delivering value

If your business, startup, etc. is based upon access to the agent (like pay $5 to generate avatar images), then you can refer to the common Product Market Fit (PMF) advice and can move on.

However, if you have added your agent into your existing app, or it's something there to help users, etc. then it's a little trickier to understand whether it's a net positive, or net negative. Even more difficult when you will inevitably have to weigh the benefit to the cost of LLM tokens + maintenance. I would provide the following suggestions about it:

Don't roll the AI agent integration out to all of your users at once. Instead, A/B tests with cohorts. If you are already measuring product success indicators, then this becomes a walk in the park. My understanding is that most startups (even well-established ones) aren't taking a sophisticated approach here. So ultimately you may want to do something as simple as measure how user retention is affected by either having the agent or not. Specifically, I would be testing cohorts of users who just signed up and never experienced your product without the AI, as well as cohorts where it's an addition to the product they're already familiar with. This is an idea straight out of Duolingo's playbook - there are some great videos on YouTube where members from their growth team described their learnings, approaches, frameworks, etc. but now I've gotten on a tangent!
Watch revenue growth, and watch Coze token usage. If you don't do anything else with cohorts, measuring retention, etc. then after you turn the integration on, just check in once a few times per month and make sure these lines are going the way you expect them to. Token usage skyrocketing without any bump in revenue growth trends might indicate a problem (users could be using your agent, without actually deriving enough value out of it for them to stick around, pay for higher tiers, convert to a paying user, etc.)

Monitor for successful or unsuccessful outcomes

This is somewhat related to my earlier points about measuring what factors are impacting your growth (in our case, to answer the question "Did adding the agent drive our key metrics in the right direction?"). But this point is more specific at the tactical level, to answer questions like

"How long did the user interact with the chatbot for?"
"What are the semantics of the user's messages (positive, frustrated, confused?)"
"How long did the user session last after interacting with the chatbot?" (is the interaction causing them to give up and try another product?)

Instrumenting your app to answer these questions is far outside the scope of this article, but these are really interesting ideas to explore and could be generally applied to different features of your product too. Of course, I would only bother to build the measurement pipeline for this if you already have a sizeable number of users, otherwise, you won't have a significant sample size to derive conclusions and could be risking way over-engineering this instead of working on things that will get you the userbase big enough to justify worrying about this kind of monitoring.