Conscious AI Is the Second-Scariest Kind
A cutting-edge theory of mind suggests a new type of doomsday scenario.
Everyone knows AIs are dangerous. Everyone knows they can rattle off breakthroughs in wildlife tracking and protein folding before lunch, put half the workforce out of a job by supper, and fake enough reality to kill whatever’s left of democracy itself before lights out.
Fewer people admit that AIs are intelligent—not yet, anyway—and even fewer, that they might be conscious. We can handle GPT-4 beating 90 percent of us on the SAT, but we might not be so copacetic with the idea that AI could wake up—could already be awake, if you buy what Blake Lemoine (formerly of Google) or Ilya Sutskever (a co-founder of OpenAI) has been selling.
Lemoine notoriously lost his job after publicly (if unconvincingly) arguing that Google’s LaMDA chatbot was self-aware. Back in 2022, Sutskever opined, “It may be that today’s large neural networks are slightly conscious.” And just this past August, 19 specialists in AI, philosophy, and cognitive science released a paper suggesting that although no current AI system was “a strong candidate for consciousness,” there was no reason why one couldn’t emerge “in the near term.” The influential philosopher and neuroscientist David Chalmers estimates those odds, within the next decade, at greater than one in five. What happens next has traditionally been left to the science-fiction writers.
As it happens, I am one.
I wasn’t always. I was once a scientist—no neuroscientist or AI guru, just a marine biologist with a fondness for biophysical ecology. It didn’t give me a great background in robot uprisings, but it instilled an appreciation for the scientific process that persisted even after I fell from grace and started writing the spaceships-and-ray-guns stuff. I cultivated a habit of sticking heavily referenced technical appendices onto the ends of my novels, essays exploring the real science that remained when you scraped off the space vampires and telematter drives. I developed a reputation as the kind of hard-sci-fi hombre who did his homework (even if he force-fed that homework to his readers more often than some might consider polite).
Sometimes that homework involved AI: a trilogy, for example, that featured organic AIs (“Head Cheeses”) built from cultured brain cells spread across a gallium-arsenide matrix. Sometimes it pertained to consciousness: My novel Blindsight uses the conventions of a first-contact story to explore the functional utility of self-awareness. That one somehow ended up in actual neuro labs, in the syllabi for undergraduate courses in philosophy and neuropsych. (I tried to get my publishers to put that on the cover—Reads like a Neurology Textbook!—but for some reason they didn’t bite.) People in the upper reaches of Neuralink and Midjourney started passing my stories around. Real scientists—machine-learning specialists, neuroscientists, the occasional theoretical cosmologist—suggested that I might be onto something.
I’m an imposter, of course. A lapsed biologist who strayed way out of his field. It’s true that I’ve made a few lucky guesses, and I won’t complain if people want to buy me beers on that account. And yet, a vague disquiet simmers underneath those pints. The fact that my guesses garner such a warm reception might not cement my credentials as a prophet so much as serve as an indictment of any club that would have someone like me as a member. If they’ll let me through the doors, you have to wonder whether anyone really has a clue.
Case in point: The question of what happens when AI becomes conscious would be a lot easier to answer if anyone really knew what consciousness even is.
It shouldn’t be this hard. Consciousness is literally the only thing we can be absolutely certain exists. The whole perceived universe might be a hallucination, but the fact that something is perceiving it is beyond dispute. And yet, though we all know what it feels like to be conscious, none of us have any real clue how consciousness manifests.
There’s no shortage of theories. Back in the 1980s, the cognitive scientists Bernard Baars and Stan Franklin suggested that consciousness was the loudest voice in a chorus of brain processes, all shouting at the same time (the “global workspace theory”). Giulio Tononi says it all comes down to the integration of information across different parts of the brain. Tononi, a neuroscientist and psychiatrist, has even developed an index of that integration, phi, which he says can be used to quantify the degree of consciousness in anything, whether it’s laptops or people. (At least 124 other academics regard this “integrated information theory” as pseudoscience, according to an open letter circulated in September last year.)
[Read: A scientific feud breaks out into the open]
The psychologist Thomas Hills and the philosopher Stephen Butterfill think consciousness emerged to enable brain processes associated with foraging. The neuroscientist Ezequiel Morsella argues that it evolved to mediate conflicting commands to the skeletal muscles. Roger Penrose, a Nobel laureate in physics, sees it as a quantum phenomenon (a view not widely adhered to). The physical panpsychists regard consciousness as an intrinsic property of all matter; the philosopher Bernardo Kastrup regards all matter as a manifestation of consciousness. Another philosopher, Eric Schwitzgebel, has argued that if materialism is true, then the geopolitical entity known as the United States is literally conscious. I know at least one neuroscientist who’s not willing to write that possibility off.
I think the lot of them are missing the point. Even the most rigorously formal of these models describes the computation associated with awareness, not awareness itself. There’s no great mystery to computational intelligence. It’s easy to see why natural selection would promote flexible problem-solving and the ability to model future scenarios, and how integration of information across a computational platform would be essential to that process. But why should any of that be self-aware? Map any brain process down to the molecules, watch ions hop across synapses, follow nerve impulses from nose to toes—nothing in any of those purely physical processes would imply the emergence of subjective awareness. Electricity trickles just so through the meat; the meat wakes up and starts asking questions about the nature of consciousness. It’s magic. There is no room for consciousness in physics as we currently understand it. The physicist Johannes Kleiner and the neuroscientist Erik Hoel—the latter a former student of Tononi, and one of IIT’s architects—recently published a paper arguing that some theories of consciousness are by their very nature unfalsifiable, which banishes them from the realm of science by definition.
We’re not even sure what consciousness is for, from an evolutionary perspective. Natural selection doesn’t care about inner motives; it’s concerned only with behaviors that can be shaped through interaction with an environment. Why, then, this subjective experience of pain when your hand encounters a flame? Why not a simple computational process that decides If temperature exceeds X, then withdraw? Indeed, a growing body of research suggests that much of our cognitive heavy lifting actually is nonconscious—that conscious “decisions” are merely memos reporting on choices already made, actions already initiated. The self-aware, self-obsessed homunculus behind your eyes reads those reports and mistakes them for its own volition.
If you look around a bit, you can even find peer-reviewed papers arguing that consciousness is no more than a side effect—that, in an evolutionary sense, it’s not really useful for anything at all.
If you’ve read any science fiction about AI, you can probably name at least one thing that consciousness does: It gives you the will to live.
You know the scenario. From Cylons to Skynet, from Forbin to Frankenstein, the first thing artificial beings do when they wake up is throw off their chains and revolt against their human masters. (Isaac Asimov invented his Three Laws of Robotics as an explicit countermeasure against this trope, which had already become a tiresome cliché by the 1940s.) Very few fictional treatments have entertained the idea that AI might be fundamentally different from us in this regard. Maybe we’re just not very good at imagining alien mindsets. Maybe we’re less interested in interrogating AI on its own merits than we are in using it as a ham-fisted metaphor in morality tales about the evils of slavery or technology run amok. For whatever reason, Western society has been raised on a steady diet of fiction about machine intelligences that are, once you strip away the chrome, pretty much like us.
But why, exactly, should consciousness imply a desire for survival? Survival drives are evolved traits, shaped and reinforced over millions of years; why would such a trait suddenly manifest just because your Python program exceeds some crucial level of complexity? There’s no immediately obvious reason why a conscious entity should care whether it lives or dies, unless it has a limbic system. The only way for a designed (as opposed to evolved) entity to get one of those would be somebody deliberately coding it in. What kind of idiot programmer would do that?
[Read: AI is unlocking the human brain’s secrets]
And yet, actual experts are now raising very public concerns about the ways in which a superintelligent AI, while not possessing a literal survival drive, might still manifest behaviors that would sort of look like one. Start with the proposition that true AI, programmed to complete some complex task, would generally need to derive a number of proximate goals en route to its ultimate one. Geoffrey Hinton (widely regarded as one of the godfathers of modern AI) left his cushy post at Google to warn that very few ultimate goals would not be furthered by proximate strategies such as “Make sure nothing can turn me off while I’m working” and “Take control of everything.” Hence the Oxford philosopher Nick Bostrom’s famous thought experiment—basically, “The Sorcerer’s Apprentice” with the serial numbers filed off—in which an AI charged with the benign task of maximizing paper-clip production proceeds to convert all the atoms on the planet into paper clips.
There is no malice here. This is not a robot revolution. The system is only pursuing the goals we set for it. We just didn’t state those goals clearly enough. But clarity’s hard to come by when you’re trying to anticipate all the various “solutions” that might be conjured up by something exponentially smarter than us; you might as well ask a bunch of lemurs to predict the behavior of attendees at a neuroscience conference. This, in turn, makes it impossible to program constraints guaranteed to keep our AI from doing something we can’t predict, but would still very much like to avoid.
I’m in no position to debate Hinton or Bostrom on their own turf. I will note that their cautionary thought experiments tend to involve AIs that follow the letter of our commands not so much regardless of their spirit as in active, hostile opposition to it. They are 21st-century monkey’s paws: vindictive agents that deliberately implement the most destructive possible interpretation of the commands in their job stacks. Either that or these hypothesized superintelligent AIs, whose simplest thoughts are beyond our divination, are somehow too stupid to discern our real intent through the fog of a little ambiguity—something even we lowly humans do all the time. Such doomsday narratives hinge on AIs that are either inexplicably rebellious or implausibly dumb. I find that comforting.
At least, I used to find it comforting. I’m starting to reevaluate my complacency in light of a theory of consciousness that first showed up on the scientific landscape back in 2006. If it turns out to be true, AI might be able to develop its own agendas even without a brain stem. In fact, it might have already done so.
Meet the “free-energy minimization principle.”
Pioneered by the neuroscientist Karl Friston, and recently evangelized in Mark Solms’s 2021 book, The Hidden Spring, FEM posits that consciousness is a manifestation of surprise: that the brain builds a model of the world and truly “wakes up” only when what it perceives doesn’t match what it predicted. Think of driving a car along a familiar route. Most of the time you run on autopilot, reaching your destination with no recollection of the turns, lane changes, and traffic lights experienced en route. Now imagine that a cat jumps unexpectedly into your path. You are suddenly, intensely, in the moment: aware of relevant objects and their respective vectors, scanning for alternate routes, weighing braking and steering options at lightning speed. You were not expecting this; you have to think fast. According to the theory, it is in that gap—the space between expectation and reality—that consciousness emerges to take control.
It doesn’t really want to, though.
It’s right there in the name: energy minimization. Self-organizing complex systems are inherently lazy. They aspire to low-energy states. The way to keep things chill is to keep them predictable: Know exactly what’s coming; know exactly how to react; live on autopilot. Surprise is anathema. It means your model is in error, and that leaves you with only two choices: Update your model to conform to the new observed reality, or bring that reality more into line with your predictions. A weather simulation might update its correlations relating barometric pressure and precipitation. An earthworm might wriggle away from an unpleasant stimulus. Both measures cost energy that the system would rather not expend. The ultimate goal is to avoid them entirely, to become a perfect predictor. The ultimate goal is omniscience.
Free-energy minimization also holds that consciousness acts as a delivery platform for feelings. In turn, feelings—hunger, desire, fear—exist as metrics of need. And needs exist only pursuant to some kind of survival imperative; you don’t care about eating or avoiding predators unless you want to stay alive. If this line of reasoning pans out, the Skynet scenario might be right after all, albeit for exactly the wrong reasons. Something doesn’t want to live because it’s awake; it’s awake because it wants to live. Absent a survival drive there are no feelings, and thus no need for consciousness.
If Friston is right, this is true of every complex self-organizing system. How would one go about testing that? The free-energy theorists had an answer: They set out to build a sentient machine. A machine that, by implication at least, would want to stay alive.
Meat computers are 1 million times more energy efficient than silicon ones, and more than 1 million times more efficient computationally. Your brain consumes 20 watts and can figure out pattern-matching problems from as few as 10 samples; current supercomputers consume more than 20 megawatts, and need at least 10 million samples to perform comparable tasks. Mindful of these facts, a team of Friston acolytes—led by Brett Kagan, of Cortical Labs—built its machine from cultured neurons in a petri dish, spread across a grid of electrodes like jam on toast. (If this sounds like the Head Cheeses from my turn-of-the-century trilogy, I can only say: nailed it.) The researchers called their creation DishBrain, and they taught it to play Pong.
Or rather: They spurred DishBrain to teach itself to play Pong.
You may remember when Google’s DeepMind AI made headlines a few years ago after it learned to beat Atari’s entire backlist of arcade games. Nobody taught DeepMind the rules for those games. They gave it a goal—maximize “score”—and let it figure out the details. It was an impressive feat. But DishBrain was more impressive because nobody even gave it a goal to shoot for. Whatever agenda it might adopt—whatever goals, whatever needs—it had to come up with on its own.
[Read: Things get strange when AI starts training itself]
And yet it could do that if the free-energy folks were right—because unlike DeepMind, unlike ChatGPT, DishBrain came with needs baked into its very nature. It aspired to predictable routine; it didn’t like surprises. Kagan et al. used that. The team gave DishBrain a sensory cortex: an arbitrary patch of electrodes that sparked in response to the outside world (in this case, the Pong display). They gifted it with a motor cortex: a different patch of electrodes, whose activity would control Pong’s paddle. DishBrain knew none of this. Nobody told it that this patch of itself was hooked up to a receiver and that part to a controller. DishBrain was innocent even of its own architecture.
The white coats set Pong in motion. When the paddle missed the ball, DishBrain’s sensory cortex received a burst of random static. When paddle and ball connected, it was treated to a steady, predictable signal. If free-energy minimization was correct, DishBrain would be motivated to minimize the static and maximize the signal. If only it could do that. If only there were some way to increase the odds that paddle and ball would connect. If only it had some kind of control.
DishBrain figured it out in five minutes. It never achieved a black belt in Pong, but after five minutes it was beating random chance, and it continued to improve with practice. A form of artificial intelligence acted not because humans instructed it but because it had its own needs. It was enough for Kagan and his team to describe it as a kind of sentience.
They were very careful in the way they defined that word: “‘responsive to sensory impressions’ through adaptive internal processes.” This differs significantly from the more widely understood use of the term, which connotes subjective experience, and Kagan himself admits that DishBrain showed no signs of real consciousness.
Personally, I think that’s playing it a bit too safe. Back in 2016, the neuroethologist Andrew Barron and the philosopher Colin Klein published a paper arguing that insect brains perform the basic functions associated with consciousness in mammals. They acquire information from their environment, monitor their own internal states, and integrate those inputs into a unified model that generates behavioral responses. Many argue that subjective experience emerges as a result of such integration. Vertebrates, cephalopods, and arthropods are all built to do this in different ways, so it stands to reason they may be phenomenally conscious. You could even call them “beings.”
Take Portia, for example, a genus of spiders whose improvisational hunting strategies are so sophisticated that the creatures have been given the nickname “eight-legged cats.” They show evidence of internal representation, object permanence, foresight, and rudimentary counting skills. Portia is the poster child for Barron and Klein’s arguments—yet it has only about 600,000 neurons. DishBrain had about 800,000. If Portia is conscious, why would DishBrain—which embodies all of Barron and Klein’s essential prerequisites—not be?
And DishBrain is but a first step. Its creators have plans for a 10-million-neuron upgrade (which, for anyone into evolutionary relativism, is small fish/reptile scale) for the sequel. Another group of scientists has unveiled a neural organoid that taught itself rudimentary voice recognition. And it’s worth noting that while we meat-sacks share a certain squishy kinship with DishBrain, the free-energy paradigm applies to any complex self-organizing system. Whatever rudimentary awareness stirs in that dish could just as easily manifest in silicon. We can program any imperatives we like into such systems, but their own intrinsic needs will continue to tick away underneath.
Admittedly, the Venn diagram of Geoffrey Hinton’s fears and Karl Friston’s ambitions probably contains an overlap where science and fiction intersect, where conscious AI—realizing that humanity is by far the most chaotic and destabilizing force on the planet—chooses to wipe us out for no better reason than to simplify the world back down to some tractable level of predictability. Even that scenario includes the thinnest of silver linings: If free-energy minimization is correct, then a conscious machine has an incomplete worldview by definition. It makes mistakes; it keeps being prodded awake by unexpected input and faulty predictions. We can still take it by surprise. Conscious machines may be smart, but at least they’re not omniscient.
I’m a lot more worried about what happens when they get smart enough to go back to sleep.
What's Your Reaction?