For a long time, IT specialists worked without a care in the world. They developed, built, and deployed software smoothly. Then the era of isolation hit, and suddenly, they got bored (of course, this is a playful take on the actual events). IT folks wanted to create something that could handle their work while they stayed home: answer routine questions, generate cool avatars, and analyze vast amounts of data in minutes. They dreamed of traveling to a fantastic place, and so, you guessed it, they revolutionized AI.
AI is now functioning, providing answers, and improving lives. As skilled an assistant as it is, AI is truly effective only when used in the right context.
We’re witnessing rapid progress in AI applications, from image and video generation to stock market forecasting and cryptocurrency analysis. Yet, AI may offer information we don’t ask for or provide blatantly false answers. Its behavior is very much like that of household cats — you know, the kind that sits quietly and then suddenly pounces on you?
Our cats, as well as AI, enjoy being unpredictable:
You might wonder what determinism and stochasticity mean — let’s find out.
A deterministic system always produces the same result given the same input — think idempotency if you're a DevOps engineer. A real-world example would be your cat that eats the same amount of food you put in its bowl every time — this is determinism. But when the cat sniffs and only eats half, it’s no longer deterministic.
A stochastic process includes an element of randomness: with the same input, the result can vary. For example, a machine learning model often uses stochastic algorithms, like Stochastic Gradient Descent (SGD), which trains the model by picking random chunks of data rather than the entire dataset.
These definitions don’t fully explain why our AIs sometimes hallucinate and behave chaotically. There are other contributing factors, including the following:
If we look a little closer, we'll see other mechanisms that influence the unpredictable behavior of AI models.
You probably know that the AIs everyone uses rely on various neural network algorithms. Here are some types of neural networks:
We need all that context to understand why the most common model, ChatGPT, often hallucinates.
ChatGPT runs on the Transformer architecture, first introduced in the 2017 paper, “Attention Is All You Need.” This is the very mechanism that revolutionized text processing. Transformers operate on the self-attention mechanism, which allows them to consider the global context rather than just the nearest words like older recurrent neural networks (LSTM and GRU) do. The model belongs to the GPT (Generative Pre-Trained Transformer) series, which means:
ChatGPT’s answers result from a stochastic process rather than a rigid rule. It doesn’t memorize or reproduce texts but generates responses using a probabilistic model.
When ChatGPT responds, it doesn’t choose the single correct word but computes a probability distribution.
P(wi|w1, w2, ..., wi-1), where:
w1, w2, ..., wi-1 — the previous words
For example, if you ask, “What day is it today?” ChatGPT might have different probabilities:
It will mostly often choose the word with the highest probability, but due to generation temperature (a parameter that controls randomness), it might sometimes choose a less likely option based on context.
ChatGPT works with a limited context window, meaning it only "remembers" the last NN tokens. For GPT-4, the context window is about 128k tokens (around 300 pages of text). If important information is outside this context, it may:
Yet, ChatGPT can often correct its answer after you ask if it's sure. However, ChatGPT can often correct its answer if you ask whether it’s sure.
When you ask ChatGPT, “Are you sure?” it reanalyzes its answer using a new context where doubt is present. This results in:
This process can be explained by Bayesian probability.
P(A|B) = P(B|A)P(A) / P(B), where:
P(A|B) — the probability that answer A is correct, considering your follow-up question B.
P(B|A) — the probability that you would have asked if ChatGPT was initially right.
P(A) — the initial probability of ChatGPT's answer.
P(B) — the overall probability that you would ask.
Too much information for you? Brain overheating? Imagine that AIs also get overwhelmed by large amounts of information.
Massive amounts of text data flow into ChatGPT’s training, including noise or contradictory information, such as:
These are examples of model hallucinations, which occur because ChatGPT’s weights are trained on probabilistic word associations rather than strict logic.
Here is what we can learn from this. ChatGPT hallucinates since it:
Predicts probabilistically, not deterministically.
Has a limited memory (context window).
Recalculates probabilities when questioned.
Has training data that includes noise and contradictions.
It’s that straightforward. Hope you didn’t get tired. If you did, that’s a good sign because it means you’re thinking critically, which is exactly what we should do when working with AI.