paint-brush
AI’s Non-Determinism, Hallucinations, And... Cats?by@alexandersimonov
601 reads
601 reads

AI’s Non-Determinism, Hallucinations, And... Cats?

by Alexander SimonovFebruary 16th, 2025
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

AI is like cats: sometimes they eat, sometimes ignore it, and sometimes they scratch you. ChatGPT’s answers result from a stochastic process rather than a rigid rule. It tends to make up its own answers and is reliable only when used in the right context.
featured image - AI’s Non-Determinism, Hallucinations, And... Cats?
Alexander Simonov HackerNoon profile picture
0-item
1-item

For a long time, IT specialists worked without a care in the world. They developed, built, and deployed software smoothly. Then the era of isolation hit, and suddenly, they got bored (of course, this is a playful take on the actual events). IT folks wanted to create something that could handle their work while they stayed home: answer routine questions, generate cool avatars, and analyze vast amounts of data in minutes. They dreamed of traveling to a fantastic place, and so, you guessed it, they revolutionized AI.


AI is now functioning, providing answers, and improving lives. As skilled an assistant as it is, AI is truly effective only when used in the right context.


We’re witnessing rapid progress in AI applications, from image and video generation to stock market forecasting and cryptocurrency analysis. Yet, AI may offer information we don’t ask for or provide blatantly false answers. Its behavior is very much like that of household cats — you know, the kind that sits quietly and then suddenly pounces on you?


ChatGPT when you ask it a simple question


Our cats, as well as AI, enjoy being unpredictable:


  • You give them the same food (or data) — sometimes they eat, sometimes they ignore it.
  • You train them to respond, but they only occasionally react when you call them.
  • The bigger and wilder the cat or the larger the AI model, the harder it is to predict its behavior.
  • In the morning, cats might be calm; by evening, they turn hyperactive (just like dynamic data).
  • Cats might be friendly (deterministic) but can scratch you without warning (stochastic).


You might wonder what determinism and stochasticity mean — let’s find out.

Determinism and Stochasticity

A deterministic system always produces the same result given the same input — think idempotency if you're a DevOps engineer. A real-world example would be your cat that eats the same amount of food you put in its bowl every time — this is determinism. But when the cat sniffs and only eats half, it’s no longer deterministic.


Expected output (empty bowl) vs. Actual output


A stochastic process includes an element of randomness: with the same input, the result can vary. For example, a machine learning model often uses stochastic algorithms, like Stochastic Gradient Descent (SGD), which trains the model by picking random chunks of data rather than the entire dataset.


These definitions don’t fully explain why our AIs sometimes hallucinate and behave chaotically. There are other contributing factors, including the following:


  • Determinism
  • Stochasticity
  • Rounding errors and floating-point arithmetic
  • Multithreading and parallel computations
  • Continuously updating data
  • Chaos and the “butterfly effect”


If we look a little closer, we'll see other mechanisms that influence the unpredictable behavior of AI models.

A Glimpse of Neural Networks

You probably know that the AIs everyone uses rely on various neural network algorithms. Here are some types of neural networks:


  • Fully Connected Neural Networks (FCNN): A classic architecture where each neuron connects to every neuron in the next layer.


  • Convolutional Neural Networks (CNNs): These networks use convolutions or filters that highlight image features like edges, textures, and shapes.


  • Recurrent Neural Networks (RNNs): These networks have feedback loops that allow them to remember previous steps (namely, they remember sequences).


  • Long Short-Term Memory (LSTM): An enhanced version of RNNs with mechanisms for selectively forgetting and remembering important data.


  • Transformers: The most powerful class for text processing. They use multi-head attention, allowing them to consider the entire context simultaneously.


  • Generative Adversarial Networks (GANs): They consist of two networks, one of which generates data and the other evaluates its quality. Their competition leads to better results.


  • Autoencoders: Networks designed to compress (encode) information and then reconstruct (decode) it.


  • Graph Neural Networks (GNNs): They work with graphs (nodes and edges) rather than regular data.


We need all that context to understand why the most common model, ChatGPT, often hallucinates.

How AI Hallucinations Happen

ChatGPT runs on the Transformer architecture, first introduced in the 2017 paper, “Attention Is All You Need.”  This is the very mechanism that revolutionized text processing. Transformers operate on the self-attention mechanism, which allows them to consider the global context rather than just the nearest words like older recurrent neural networks (LSTM and GRU) do. The model belongs to the GPT (Generative Pre-Trained Transformer) series, which means:


  • Pre-trained: It was initially trained on enormous amounts of text (books, articles, websites, and code).
  • Generative: Its task is to generate text, not just classify or extract facts.


ChatGPT’s answers result from a stochastic process rather than a rigid rule. It doesn’t memorize or reproduce texts but generates responses using a probabilistic model.

Word Prediction as a Probabilistic Process

When ChatGPT responds, it doesn’t choose the single correct word but computes a probability distribution.


P(wi|w1, w2, ..., wi-1), where:

  • “wi” — the next word in the sentence

w1, w2, ..., wi-1 — the previous words

  • P(wi|w1, ..., wi-1) — the probability that “wi” will be the next word


For example, if you ask, “What day is it today?” ChatGPT might have different probabilities:


  • “Monday” — P=0.7
  • “Wednesday” — P=0.2
  • “42” — P=0.0001


It will mostly often choose the word with the highest probability, but due to generation temperature (a parameter that controls randomness), it might sometimes choose a less likely option based on context.

Context Influence and Information Forgetting

ChatGPT works with a limited context window, meaning it only "remembers" the last NN tokens. For GPT-4, the context window is about 128k tokens (around 300 pages of text). If important information is outside this context, it may:


  • Forget details (context clipping effect)
  • Make-up information (stochastic process)


Yet, ChatGPT can often correct its answer after you ask if it's sure. However, ChatGPT can often correct its answer if you ask whether it’s sure.

AI Sometimes Corrects Itself, But Why?

When you ask ChatGPT, “Are you sure?” it reanalyzes its answer using a new context where doubt is present. This results in:


  • Recalculating answer probabilities.
  • Choosing a more plausible option if one exists.


This process can be explained by Bayesian probability.


P(A|B) = P(B|A)P(A) / P(B), where:


  • P(A|B) — the probability that answer A is correct, considering your follow-up question B.

  • P(B|A) — the probability that you would have asked if ChatGPT was initially right.

  • P(A) — the initial probability of ChatGPT's answer.

  • P(B) — the overall probability that you would ask.


Too much information for you? Brain overheating? Imagine that AIs also get overwhelmed by large amounts of information.

Errors Due to Overfitting and Noisy Data

Massive amounts of text data flow into ChatGPT’s training, including noise or contradictory information, such as:


  • Some sources say the Earth is round, while others claim it’s flat.


  • AI can’t always determine which information is true when it appears with varying probabilities.


ChatGPT processing contradictory data be like


These are examples of model hallucinations, which occur because ChatGPT’s weights are trained on probabilistic word associations rather than strict logic.

The Bottom Line

Here is what we can learn from this. ChatGPT hallucinates since it:


  • Predicts probabilistically, not deterministically.

  • Has a limited memory (context window).

  • Recalculates probabilities when questioned.

  • Has training data that includes noise and contradictions.


It’s that straightforward. Hope you didn’t get tired. If you did, that’s a good sign because it means you’re thinking critically, which is exactly what we should do when working with AI.