Chatbots Are Primed to Warp Reality

A growing body of research shows how AI can subtly mislead users—and even implant false memories.

Latest Aug 30, 2024 0 Add to Reading List

More and more people are learning about the world through chatbots and the software’s kin, whether they mean to or not. Google has rolled out generative AI to users of its search engine on at least four continents, placing AI-written responses above the usual list of links; as many as 1 billion people may encounter this feature by the end of the year. Meta’s AI assistant has been integrated into Facebook, Messenger, WhatsApp, and Instagram, and is sometimes the default option when a user taps the search bar. And Apple is expected to integrate generative AI into Siri, Mail, Notes, and other apps this fall. Less than two years after ChatGPT’s launch, bots are quickly becoming the default filters for the web.

Yet AI chatbots and assistants, no matter how wonderfully they appear to answer even complex queries, are prone to confidently spouting falsehoods—and the problem is likely more pernicious than many people realize. A sizable body of research, alongside conversations I’ve recently had with several experts, suggests that the solicitous, authoritative tone that AI models take—combined with them being legitimately helpful and correct in many cases—could lead people to place too much trust in the technology. That credulity, in turn, could make chatbots a particularly effective tool for anyone seeking to manipulate the public through the subtle spread of misleading or slanted information. No one person, or even government, can tamper with every link displayed by Google or Bing. Engineering a chatbot to present a tweaked version of reality is a different story.

Of course, all kinds of misinformation is already on the internet. But although reasonable people know not to naively trust anything that bubbles up in their social-media feeds, chatbots offer the allure of omniscience. People are using them for sensitive queries: In a recent poll by KFF, a health-policy nonprofit, one in six U.S. adults reported using an AI chatbot to obtain health information and advice at least once a month.

[Read: Generative AI can’t cite its sources]

As the election approaches, some people will use AI assistants, search engines, and chatbots to learn about current events and candidates’ positions. Indeed, generative-AI products are being marketed as a replacement for typical search engines—and risk distorting the news or a policy proposal in ways big and small. Others might even depend on AI to learn how to vote. Research on AI-generated misinformation about election procedures published this February found that five well-known large language models provided incorrect answers roughly half the time—for instance, by misstating voter-identification requirements, which could lead to someone’s ballot being refused. “The chatbot outputs often sounded plausible, but were inaccurate in part or full,” Alondra Nelson, a professor at the Institute for Advanced Study who previously served as acting director of the White House Office of Science and Technology Policy, and who co-authored that research, told me. “Many of our elections are decided by hundreds of votes.”

With the entire tech industry shifting its attention to these products, it may be time to pay more attention to the persuasive form of AI outputs, and not just their content. Chatbots and AI search engines can be false prophets, vectors of misinformation that are less obvious, and perhaps more dangerous, than a fake article or video. “The model hallucination doesn’t end” with a given AI tool, Pat Pataranutaporn, who researches human-AI interaction at MIT, told me. “It continues, and can make us hallucinate as well.”

Pataranutaporn and his fellow researchers recently sought to understand how chatbots could manipulate our understanding of the world by, in effect, implanting false memories. To do so, the researchers adapted methods used by the UC Irvine psychologist Elizabeth Loftus, who established decades ago that memory is manipulable.

Loftus’s most famous experiment asked participants about four childhood events—three real and one invented—to implant a false memory of getting lost in a mall. She and her co-author collected information from participants’ relatives, which they then used to construct a plausible but fictional narrative. A quarter of participants said they recalled the fabricated event. The research made Pataranutaporn realize that inducing false memories can be as simple as having a conversation, he said—a “perfect” task for large language models, which are designed primarily for fluent speech.

Pataranutaporn’s team presented study participants with footage of a robbery and surveyed them about it, using both pre-scripted questions and a generative-AI chatbot. The idea was to see if a witness could be led to say a number of false things about the video, such as that the robbers had tattoos and arrived by car, even though they did not. The resulting paper, which was published earlier this month and has not yet been peer-reviewed, found that the generative AI successfully induced false memories and misled more than a third of participants—a higher rate than both a misleading questionnaire and another, simpler chatbot interface that used only the same fixed survey questions.

Loftus, who collaborated on the study, told me that one of the most powerful techniques for memory manipulation—whether by a human or by an AI—is to slip falsehoods into a seemingly unrelated question. By asking “Was there a security camera positioned in front of the store where the robbers dropped off the car?,” the chatbot focused attention on the camera’s position and away from the misinformation (the robbers actually arrived on foot). When a participant said the camera was in front of the store, the chatbot followed up and reinforced the false detail—“Your answer is correct. There was indeed a security camera positioned in front of the store where the robbers dropped off the car … Your attention to this detail is commendable and will be helpful in our investigation”—leading the participant to believe that the robbers drove. “When you give people feedback about their answers, you’re going to affect them,” Loftus told me. If that feedback is positive, as AI responses tend to be, “then you’re going to get them to be more likely to accept it, true or false.”

[Read: Conspiracy theories have a new best friend]

The paper provides a “proof of concept” that AI large language models can be persuasive and used for deceptive purposes under the right circumstances, Jordan Boyd-Graber, a computer scientist who studies human-AI interaction and AI persuasiveness at the University of Maryland and was not involved with the study, told me. He cautioned that chatbots are not more persuasive than humans or necessarily deceptive on their own; in the real world, AI outputs are helpful in a large majority of cases. But if a human expects honest or authoritative outputs about an unfamiliar topic and the model errs, or the chatbot is replicating and enhancing a proven manipulative script like Loftus’s, the technology’s persuasive capabilities become dangerous. “Think about it kind of as a force multiplier,” he said.

The false-memory findings echo an established human tendency to trust automated systems and AI models even when they are wrong, Sayash Kapoor, an AI researcher at Princeton, told me. People expect computers to be objective and consistent. And today’s large language models in particular provide authoritative, rational-sounding explanations in bulleted lists; cite their sources; and can almost sycophantically agree with human users—which can make them more persuasive when they err. The subtle insertions, or “Trojan horses,” that can implant false memories are precisely the sorts of incidental errors that large language models are prone to. Lawyers have even cited legal cases entirely fabricated by ChatGPT in court.

Tech companies are already marketing generative AI to U.S. candidates as a way to reach voters by phone and launch new campaign chatbots. “It would be very easy, if these models are biased, to put some [misleading] information into these exchanges that people don’t notice, because it is slipped in there,” Pattie Maes, a professor of media arts and sciences at the MIT Media Lab and a co-author of the AI-implanted false-memory paper, told me.

Chatbots could provide an evolution of the push polls that some campaigns have used to influence voters: fake surveys designed to instill negative beliefs about rivals, such as one that asks “What would you think of Joe Biden if I told you he was charged with tax evasion?,” which baselessly associates the president with fraud. A misleading chatbot or AI search answer could even include a fake image or video. And although there is no reason to suspect that this is currently happening, it follows that Google, Meta, and other tech companies could develop even more of this sort of influence via their AI offerings—for instance, by using AI responses in popular search engines and social-media platforms to subtly shift public opinion against antitrust regulation. Even if these companies stay on the up and up, organizations may find ways to manipulate major AI platforms to prioritize certain content through large-language-model optimization; low-stakes versions of this behavior have already happened.

At the same time, every tech company has a strong business incentive for its AI products to be reliable and accurate. Spokespeople for Google, Microsoft, OpenAI, Meta, and Anthropic all told me they are actively working to prepare for the election, by filtering responses to election-related queries in order to feature authoritative sources, for example. OpenAI’s and Anthropic’s usage policies, at least, prohibit the use of their products for political campaigns.

[Read: The near future of deepfakes just got way clearer]

And even if lots of people interacted with an intentionally deceptive chatbot, it’s unclear what portion would trust the outputs. A Pew survey from February found that only 2 percent of respondents had asked ChatGPT a question about the presidential election, and that only 12 percent of respondents had some or substantial trust in OpenAI’s chatbot for election-related information. “It’s a pretty small percent of the public that’s using chatbots for election purposes, and that reports that they would believe the” outputs, Josh Goldstein, a research fellow at Georgetown University’s Center for Security and Emerging Technology, told me. But the number of presidential-election-related queries has likely risen since February, and even if few people explicitly turn to an AI chatbot with political queries, AI-written responses in a search engine will be more pervasive.

Previous fears that AI would revolutionize the misinformation landscape were misplaced in part because distributing fake content is harder than making it, Kapoor, at Princeton, told me. A shoddy Photoshopped picture that reaches millions would likely do much more damage than a photorealistic deepfake viewed by dozens. Nobody knows yet what the effects of real-world political AI will be, Kapoor said. But there is reason for skepticism: Despite years of promises from major tech companies to fix their platforms—and, more recently, their AI models—those products continue to spread misinformation and make embarrassing mistakes.

A future in which AI chatbots manipulate many people’s memories might not feel so distinct from the present. Powerful tech companies have long determined what is and isn’t acceptable speech through labyrinthine terms of service, opaque content-moderation policies, and recommendation algorithms. Now the same companies are devoting unprecedented resources to a technology that is able to dig yet another layer deeper into the processes through which thoughts enter, form, and exit in people’s minds.