Many developers need to build apps with LLMs but find that creating a simple abstraction on top of something like Gemini/ChatGPT/etc is challenging. Additionally, there aren't many established resources on this topic, like there are with system design, leaving new developers unsure of where to begin.
Usually the simplest possible “simple wrapper” architecture looks like this (
This article offers fundamental definitions that have proven useful in my own “simple wrappers” development. I'll illustrate these definitions with practical examples (mostly built on top of
A Tool is a self-contained piece of code designed to perform a specific action and meet two key requirements for effective use by an LLM:
Tools are more or less part of the Agents that is a classical software. Here is an example of the Tool that provides functionality to execute query against BigQuery table -
Tools can be easily independently tested (since, well, they are just classical code). It is even possible to test intellectual requirements to LLM in order for the LLM to be able to understand the prompt/documentation of the tool. Tools can be distributed separately from the agent. In fact I am a long believer that we will see, at some point, many Agent’s tools marketplaces/hubs that provide you with small tools that have very well defined docs/prompts and are well tested with different main LLMs. In facts one of my favorite Agents framework from HuggingFace going in exactly this direction,
🤗 Hub integrations: you can share and load tools to/from the Hub, and more is to come!
So yes, first hubs with the tools are already here for us to use. But in the simplest form you already can distribute your tool in the form of the pip packages, since again, it is just a classical software no more no less.
Let me give you another, more complex, example of tools that are designed to do real world investments (trade with stocks/options/etc):
Small note: In modern days people often talk about
Agent - is a combination of: tools/ prompt / other agents; that allows one to do one atomic task end to end with predictable level of stability.
Let’s start right with the example of a BigQuery agent that is using tools from the example above.
You are a helpful assistant that helps to query data from a BigQuery database.
You have access to the tools required for this.
When using the tool to query:
When generating SQL queries, be concise and avoid unnecessary clauses or joins unless explicitly requested by the user.
Always return results in a clear and human-readable format. If the result is a table, format it nicely.
Arguably one can argue that some parts of the prompt can go to the tools documentation. Which is true. However there are some parts of the prompt that can NOT. For example, there are requirements on how data should be presented back to the user. There are actions that LLM needs to take before calling some tools, etc.
Now, imagine that you are building a specialized BigQuery agent on top of the generic one (and to be fair this is the main idea of generic one to be generic - so you can build on top). Let’s say you are building a TODO agent for yourself (most common example that people are doing for educational purposes when they are trying to create their first BigQuery agent on top of our generic BigQuery agent) with the BigQuery table that stores TODOs. Agent has full edit access, it is a single user Agent.
\In this example Agent prompt is the only place where you can add information that agent is the TODO agent and its goal is to manage users TODOs (and whatever else you want to put in the prompt). It can not go to tools (since tools are provided to you) it has to go to the agent prompt and thus - agent has to have a prompt.
\Finally, since an agent is an entity that can do one task end to end it defines permissions. It does not make any sense to set permissions on the tools level since on the tools level you do not know who will be used and in which way. On the agent level, however, you know exactly which tools are used (and for which tools you need to set permissions).
\Let’s take another example (agent that allows one to trade on the financial markets).
You are a tool for carrying out transactions on the stock market. You act as an interface to interact with the Alpaca markets API.
Key instructions:
Rules for performing functions for interaction with Alpaca:
Rules for working with data:
As you can see there are many examples from here that does not belong to the documentation of any specific tool call (or will have to be replicated in ALL of them)
One of the key parts of how we have defined the Agent - “... to do one atomic task end to end …”. How to define “one atomic task” I will leave up to you, since this is a very similar question of how to define one task when you write a new class in python or Java. I believe the majority of the developers have a good intuition of how to do it (and if you do not I probably will not be able to fix it in the scope of several sentences).
So what to do if you need to do a more complex sequence of events? Finally we can close the loop on our example of the investor bot. Real world example, we had a customer who needed to instantiate very simple trading strategy, that can expressed in the following way:
Simple strategy, right? But due to the fact that it is actually multi steps, you will be surprised how much of a problem this pipeline creates if you try to execute it in full with one message to the investing agent (even if you are using the most advanced model available). In short, in production it will not produce production required stability/reliability quality even if the most expansive model is used. Furthermore, making it more reliable requires unreasonable investments (like using the most advanced model) when it is actually not necessary.
To solve this problem we need to introduce a pipeline. LLM Pipeline is a classical DAG where each step is:
Several specific aspects of the pipeline that are not required but I would not consider using any pipeline framework that does not allow this:
With this definition here is an example of the pipeline that we defined earlier:
As you can see with the example each step is atomic, self sufficient. It has a predictable final type (boolean/float/etc). As a result, in our experiments, the same task on the same agent on the same model went from <50% success rate (of full pipeline execution end to end) when done via one mega prompt to 99+% success rate when executed in the shape of a proper pipeline.