Neural Networks, LLMs, & GPTs Explained: AI for Web Devs

Welcome back to this series where we are learning how to integrate AI tooling into web applications. In the previous posts, we got our project set up and did some basic integration.

So far, we’ve built a very basic UI with a text area that takes whatever the user writes and sends it over HTTP to OpenAI’s API. When the streaming response returns, it updates the page with each bit of text as it arrives.

That’s well and good, but it’s really not much more than a glorified HTTP client. There’s still a lot we can do to make the app much nicer for users, but before we continue building it, I thought it would be a good idea to learn more about how these AI tools actually work.

https://www.youtube.com/watch?v=Ox2QrrCOgyg&embedable=true

What Is AI?

AI stands for artificial Intelligence, and it’s basically this idea that computers can think and reason and solve problems without having the mechanism for solving those problems hard-coded in their software. Instead, they learn how to solve problems based on special training.

AI is the focus of the field of study of machine learning, which uses different tools, techniques, and methods to train computers to “think.”

One of these methodologies is the “artificial neural network” or “neural network” for short.

What Are Artificial Neural Networks?

Inspired by the biology of the human brain, a neural network consists of several nodes and their relationships. You can think of these as neurons and synapses in the human brain.

Within the taxonomy of neural networks, there is a subset called the Large Language Model (LLM).

What Are Large Language Models?

A Large Language Model is a type of neural network where all the nodes and connections are based on patterns like those found in languages and word associations.

The “large” in Large Language Model is a bit of an understatement because a lot of these LLMs are trained on data collected off of the open internet, which could be petabytes of text-based information.

As a result of training with this much information, these LLMs can end up with these things called “parameters” in the order of billions or trillions.

What Are Parameters?

Parameters are what the LLM ultimately uses to decide what word to generate based on whatever input it’s received.

That an LLM could have billions of parameters is impressive when you consider that the English language has only about 500,000 distinct words.

So when you ask a question to an LLM, it will use its parameters to come up with an answer based on the context you provide as well as the context of the data that it was trained on.

The answer it comes up with is determined by the parameters it has, and the strength of association between words using something called “embeddings.”

What Are Embeddings?

Embeddings are interesting because they are essentially a list of numbers that represent a thing. When we’re dealing with language models, those things are words.

So inside the LLM, instead of dealing with words, it’s dealing with lists of numbers. This makes it easier for it to determine the semantic similarity between two words using math.

Let’s look at an oversimplified example to get the hang of this concept. Say we wanted to put words onto a two-dimensional chart using X and Y coordinates. We would take a word and assign it an X coordinate and a Y coordinate based on our arbitrary understanding of the word. Then, we’d take another word and assign it its own X coordinate and a Y coordinate.

We’d do that for all the words that we’re trained on and end up with a chart where all the semantically similar words (like “cat” and “kitten”) would have similar X and Y coordinates, ending up close to each other.

Again, that’s an extreme oversimplification, but I hope it gets the idea across.

When we’re dealing with neural networks, we aren’t dealing with just two-dimensional charts. These embeddings can be made up of thousands of numbers. So the LLMs understanding of semantically similar things is multidimensional.

We need embeddings because it’s not possible to store and compute every word, its relationship to every other word, and the way context changes relationships between words.

By converting the words to groups of numbers, it’s possible for computers to store them and determine their semantic similarity.

Okay, that’s about as deep as I want to go into the conceptual stuff. Let’s bring it back to something more closely related to our application, and that’s “GPT.”

What Is a GPT?

GPT stands for “Generative Pre-trained Transformer.” It’s a subset of LLM that can understand language and generatively produce things like text or images (I’ll focus on text). You may already be familiar with tools like ChatGPT which outputs text.

What it generates is determined by the probability of what it predicts the outcome should be based on its training data and the input.

So when you give a GPT tool an input, it can process that information with its parameters and its embeddings and predict the next word and the next word and then the next word and keep going until it comes to what it thinks is the end of the thought.

GPTs Are Nondeterministic

Now, we have to talk about a very important point that I want to drive home. The output from these models is nondeterministic. That means it’s based on a probability curve for predicting what the next word should be.

So, for the same input, you could get many completely different outputs.

For example, if I provide an input like, “I really love a good banana…” a GPT model may respond with something like, “bread” or “pudding” or “cream pie” because based on the data that has been trained on those are semantically similar terms commonly found with “banana.”

But because the answer is based on probability there is the chance that the GPT returns something like “hammock”.

😅😅😅

Anyway, this is important to keep in mind, especially for building applications that rely on accuracy. LLMs have no concept of true and false, right and wrong, or facts and fiction. They are just producing what they think is the most likely output for whatever the input is based on the data that they’ve been trained on.

So when a GPT returns some sort of response like, “I love banana bread,” it has no idea what the concept of banana bread even is. It has no idea what a banana is, or bread is, or the fact that banana bread is amazing.

All it knows is according to the data that it’s been trained on, it’s pretty common to find “banana” and “bread” together. Occasionally, it may also find “banana” and “hammock” together.

GPTs Hallucinate

An interesting thing can happen when an LLM is trained on data because it may develop associations between words and terms that humans would never because it lacks the understanding of what those words and terms mean.

As a result, when you ask it a question, it might come up with an output that is strange, ridiculous, or categorically false.

We call these strange behaviors hallucinations (which is cute). And they can lead to some pretty funny results that you may have encountered.

Conclusion

Okay, that is about as far down the AI rabbit hole as I want to go. We covered AI, neural networks, LLMs, parameters, embeddings, GPTs, nondeterminism, and hallucinations. It was a lot!

I hope you now have a better understanding of what these things are and how they work. If you learned something, let me know!

In the next post, we’ll explore some of the concepts we learned today through prompt engineering. It’s a fascinating way to change the behavior of our application without actually changing the logic in our code.

Hopefully, that sounds interesting. And if you have an idea of an AI application that you might want to build, this will be the time to really start differentiating our apps from each other’s. I think it’s going to be a fun one.

Thank you so much for reading. If you liked this article and want to support me, the best ways to do so are to share it, sign up for my newsletter, and follow me on Twitter.

First published here.