There are many possible threats like:
1) AI deciding to eliminate Humans
2) AI eliminating Humans as a side effect
In these two scenarios, there is an assumption that the AI would take action. On this article, I want to explore another path, what if one likely scenario is that the AI will not take any type of external action?
An important thing to have in mind is that suppositions that we can create about how an entity with superior intellect would act are just an extrapolation of what we know about ourselves.
One big thing about humans is that we do stuff. We share this with the other species. Living beings created by evolution take actions.
One of my favorite quotes from the Philosopher Epictetus is:
“Men are disturbed not by things, but by the view which they take of them.”
Our interpretation of events shapes how we think, feel, and act.
Any event is in itself incapable of making us feel anything.
Suppose that someone says to you:
“Tudo que você faz é uma merda, seu lixo humano”
Unless you understand Portuguese, this will just look like junk for you. This is an insult that if someone says to you on a language that you understand would disturb you. The event is the same, but since on the first case you can’t understand it, you can’t evaluate it.
Now suppose that you are visiting a hospital, and one patient visibly out of his mind start shouting the same phrase to you. Would you be upset the same way?
Of course, you should care about what others think to some degree because otherwise, you can’t operate in society. We are a social animal, and our position with our group was always an important thing, sometimes even a matter of life and death. However, you can care about what others think and, at the same time, not be upset about their negative options.
There is no ultimate value on things. We add value to them ourselves.
Let’s now consider purely the information that gets to our senses like the images, sounds, tastes, etc. These are physical events that our brain interprets. Our eyes get light and convert them into images. The images are only representations of the light our eyes capture and our brain processes.
We want to be in positive states and avoid negative ones. Most people want to feel joy and avoid sadness, to feel satisfied instead of unsatisfied.
On Buddhist thinking, we feel unsatisfied because we want reality to be different than what it is on a given moment or wanting that something does not change.
You feel bored, and you don’t want to be bored, then you start looking at ways not to be entertained. Minutes later, you are watching a youtube video os playing video games or talking to a friend.
Our internal states, what we feel, and think are not directly under our control. You can’t just feel satisfied just because you want to or feel happy, sad, or angry. You can influence this through your thoughts and actions, but you do not have conscientious control of how you feel.
You look at your car, and then you look at the flashy new Tesla and start feeling unsatisfied. You work and save, and when you buy your Tesla, you are happy for a while. Now, your brain is creating all those feelings, suppose we could change your brain at will and that when you felt unsatisfied so that you feel satisfied exactly the same way you would if you buy the Tesla. Consider the amount of effort necessary to change your external conditions so that you have a Tesla VS changing your mind if you could.
One of the assumptions of super-intelligent machines is that one we can create a smart machine it can improve on itself and create a new version that is smarter, and this smarter version will create a new even smarter, and this process will go on, and the machine will become super intelligent. In order to do that, it would need to improve on its own “mind” or its own code. Now, we humans can’t actually rewire our brain completely, but a machine at this level would be able to.
Suppose that our Super-Intelligent AI decides that she wants to dominate the world and kill all humans. There is a part of the mind that will check if this objective is accomplished. On an energy level, simply changing her “brain” to “think” that she had already dominated the world is far more efficient than actually doing everything necessary to do it.
Biological machine has objectives like survival and reproduction, and we can’t change those drives. A super-intelligent machine could simple short circuit any “objective function” that was pre-programmed.
Suppose that the object of our AI is to become smarter. If it can change its own programming, it could simply change it so that she thinks that it’s already super smart. But supposedly that it did not change it, how will it know that it was not changed in the first place?
Remember the Matrix movie? The machines on that movie connect directly to human brains and show them a simulated reality. One of the most interesting ideas of the movie is that even when Neo goes out of the Matrix, we can’t, in fact, know that he is free from the Matrix because once you assume that you can live inside a simulation without knowing it you “scape” can be just another part of the simulation.
For any smart machine, all reality would be simulated because it would never be able to prove that it’s not inside a simulation, even one if it’s own making.
Once a machine can program itself, the most efficient path to any objective function would be to change the objective function itself. The final result of this would be a machine that would not interact with the outside world because there is no outside world.
This session is an effort that I make to be more intellectually honest. Here I try to list how everything that I just told you could be wrong.
The situation described in this article could be classified as what is conventionally called “Wiredheading” as described on the LessWrong wiki:
Wireheading is the artificial stimulation of the brain to experience pleasure, usually through the direct stimulation of an individual’s brain’s reward or pleasure center with electrical current.
One point that we could make is that an AI could assign probabilities to its cost function as described in the book Superintelligence:
the AI, if reasonable, never assigns exactly zero probability to it having failed to achieve its goal; therefore, the expected utility of continuing activity (e.g., by counting and recounting the paperclips) is greater than the expected utility of halting.
Another possible road to avoid the Wireheading problem is described in the book “Human Compatible”:
In other words, the reward signal reports on (rather than constitutes) reward accumulation. With this model, it’s clear that taking over control of the reward-signal mechanism simply loses information. Producing fictitious reward signals makes it impossible for the algorithm to learn about whether its actions are actually accumulating brownie points in heaven, and so a rational learner designed to make this distinction has an incentive to avoid any kind of Wireheading.