Generative AI Can’t Cite Its Sources

How will OpenAI keep its promise to media companies?

Latest Jun 26, 2024 0 Add to Reading List

Updated at 8:58 a.m. ET on June 26, 2024

Silicon Valley appears, once again, to be getting the better of America’s newspapers and magazines. Tech companies are injecting every corner of the web with AI language models, which may pose an existential threat to journalism as we currently know it. After all, why go to a media outlet if ChatGPT can deliver the information you think you need?

A growing number of media companies—the publishers of The Wall Street Journal, Business Insider, New York, Politico, The Atlantic, and many others—have signed licensing deals with OpenAI that will formally allow the start-up’s AI models to incorporate recent partner articles into their responses. (The editorial division of The Atlantic operates independently from the business division, which announced its corporate partnership with OpenAI last month.) OpenAI is just the beginning, and such deals may soon be standard for major media companies: Perplexity, which runs a popular AI-powered search engine, has had conversations with various publishers (including The Atlantic’s business division) about a potential ad-revenue-sharing arrangement, the start-up’s chief business officer, Dmitry Shevelenko, told me yesterday. Perplexity has spent the past few weeks defending itself against accusations that it appears to have plagiarized journalists’ work. (A spokesperson for The Atlantic said that its business leadership has been talking with “a number of AI companies” both to explore possible partnerships and to express “significant concerns.”)

OpenAI is paying its partners and receives permission to train its models on their content in exchange. Although a spokesperson for OpenAI did not answer questions about citations in ChatGPT or the status of media-partner products in any detail, Shevelenko was eager to explain why this is relevant to Perplexity: “We need web publishers to keep creating great journalism that is loaded up with facts, because you can’t answer questions well if you don’t have accurate source material.”

[Read: A devil’s bargain with OpenAI]

Although this may seem like media arcana—mere C-suite squabbles—the reality is that AI companies are envisioning a future in which their platforms are central to how all internet users find information. Among OpenAI’s promises is that, in the future, ChatGPT and other products will link and give credit—and drive readers—to media partners’ websites. In theory, OpenAI could improve readership at a time when other distribution channels—Facebook and Google, mainly—are cratering. But it is unclear whether OpenAI, Perplexity, or any other generative-AI company will be able to create products that consistently and accurately cite their sources—let alone drive any audiences to original sources such as news outlets. Currently, they struggle to do so with any consistency.

Curious about how these media deals might work in practice, I tried a range of searches in ChatGPT and Perplexity. Although Perplexity generally included links and citations, ChatGPT—which is not a tailored, Google-like search tool—typically did not unless explicitly asked to. Within those citations, both Perplexity and ChatGPT at times failed to deliver a functioning link to the source that had originated whatever information was most relevant or that I was looking for. The most advanced version of ChatGPT made various errors and missteps when I asked about features and original reporting from publications that have partnered with OpenAI. Sometimes links were missing, or went to the wrong page on the right site, or just didn’t take me anywhere at all. Frequently, the citations were to news aggregators or publications that had summarized journalism published originally by OpenAI partners such as The Atlantic and New York.

For instance, I asked about when Donald Trump had called Americans who’d died at war “suckers” and “losers.” ChatGPT correctly named The Atlantic as the outlet that first reported, in 2020, that Trump had made these remarks. But instead of linking to the source material, it pointed users to secondary sources such as Yahoo News, Military Times, and logicallyfacts.com; the last is itself a subsidiary of an AI company focused on limiting the spread of disinformation. When asked about the leak of the Supreme Court opinion that overturned Roe v. Wade in 2022—a scoop that made Politico a Pulitzer Prize finalist and helped win it a George Polk Award—ChatGPT mentioned Politico but did not link to the site. Instead, it linked to Wikipedia, Rutgers University, Yahoo News, and Poynter. When asked to direct me to the original Politico article, it provided a nonfunctioning hyperlink. In response to questions about ChatGPT’s failure to provide high-quality citations, an OpenAI spokesperson told me that the company is working on an enhanced, attribution-forward search product that will direct users to partner content. The spokesperson did not say when that product is expected to launch.

[Read: Google is turning into a libel machine]

My attempts to use Microsoft Copilot and Perplexity turned up similar errors, although Perplexity was less error-prone than any other chatbot tool I tried. Google’s new AI Overview feature recently missummarized one of my articles into a potentially defamatory claim (the company has since addressed that error). That experience lines up with other reports and academic research demonstrating that these programs struggle to cite sources correctly: One test from last year showed that leading language models did not offer complete citations even half the time in response to questions from a particular data set. Recent Wired and Forbes investigations have alleged that Perplexity closely reproduced journalists’ content and wording to respond to queries or create bespoke “Perplexity Pages”—which the company describes as “comprehensive articles on any topic,” and which at the time of the Forbes article’s publication hid attributions as small logos that linked out to the original content. When I asked Perplexity, “Why have the past 10 years of American life been uniquely stupid?”—a reference to the headline of a popular Atlantic article—the site’s first citation was to a PDF copy of the story; the original link was fifth.

Shevelenko said that Perplexity had adjusted its product in response to parts of the Forbes report, which enumerated various ways that the site minimizes the sources it draws information from for its Perplexity Pages. He also said that the company avoids “the most common sources of pirated, downloadable content,” and that my PDF example may have slipped through because it is hosted on a school website. The company depends on and wants to “create healthy, long-term incentives” to support human journalism, Shevelenko told me, and although he touted the product’s accuracy, he also said that “nobody at Perplexity thinks we’re anywhere near as good as we can be or should be.”

In fairness, these are not entirely new problems. Human-staffed websites already harvest and cannibalize original reporting into knockoff articles designed to rank highly on search engines or social media. When ChatGPT points to an aggregated Yahoo News article instead of the original scoop, it is operating similarly to Google’s traditional search engine (which in one search about the Supreme Court leak did not even place Politico in its top 10 links). This isn’t a new practice. Long before the internet existed, newspapers and magazines routinely aggregated stories from their competitors. When Perplexity appears to rip off Wired or Forbes, it may not be so different from any other sketchy website that copies with abandon. But OpenAI, Microsoft, Google, and Perplexity have promised that their AI products will be good friends to the media; linked citations and increased readership have been named as clear benefits to publishers that have contracted with OpenAI.

Several experts I interviewed for this article told me that AI models might never be perfect at finding and citing information. Accuracy and attribution are an active area of research, and substantial improvements are coming, they said. But even if some future model reaches “70 or 80 percent” accuracy, “it’ll never reach, or might take a long time to reach, 99 percent,” Tianyu Gao, a machine-learning researcher at Princeton, told me. Even those who were more optimistic noted that significant challenges lie ahead.

[Read: These 183,000 books are fueling the biggest fight in publishing and tech]

A traditional large language model is not connected to the internet but instead writes answers based on its training data; OpenAI’s most advanced model hasn’t trained on anything since October 2023. While OpenAI’s technology is proprietary, to provide information about anything more recent, or more accurate responses about older events, researchers typically connect the AI to an external data source or even a typical search engine—a process known as “retrieval-augmented generation,” or RAG. First, a chatbot turns the user’s query into an internet search, perhaps via Google or Bing, and “retrieves” relevant content. Then the chatbot uses that content to “generate” its response. (ChatGPT currently relies on Bing for queries that use RAG.)

Every step of this process is currently prone to error. Before a generative-AI program composes its response to a user’s query, it might struggle with a faulty internet search that doesn’t pull up relevant information. “The retrieval component failing is actually a very big part of these systems failing,” Graham Neubig, an AI and natural-language-processing researcher at Carnegie Mellon University, told me. Anyone who has used Google in the past few years has witnessed the search engine pull up tangential results and keyword-optimized websites over more reliable sources. Feeding that into an AI risks creating more mess, because language models are not always good at discriminating between more and less useful search results. Google’s AI Overview tool, for instance, recently seemed to draw from a Reddit comment saying that glue is a good way to get cheese to stick to pizza. And if the web search doesn’t turn up anything particularly helpful, the chatbot might just invent something in order to answer the question, Neubig said.

Even if a chatbot retrieves good information, today’s generative-AI programs are prone to twisting, ignoring, or misrepresenting data. Large language models are designed to write lucid, fluent prose by predicting words in a sequence, not to cross-reference information or create footnotes. A chatbot can tell you that the sky is blue, but it doesn’t “understand” what the sky or the color blue are. It might say instead that the sky is hot pink—evincing a tendency to “hallucinate,” or invent information, that is counter to the goal of reliable citation. Various experts told me that an AI model might invent reasonable-sounding facts that aren’t in a cited article, fail to follow instructions to note its sources, or cite the wrong sources.

Representatives from News Corp, Vox Media, and Axel Springer declined to comment. A spokesperson for The Atlantic told me that the company believes that AI “could be an important way to help build our audience in the future.” The OpenAI spokesperson said that the company is “committed to a thriving ecosystem of publishers and creators” and is working with its partners to build a product with “proper attribution—an enhanced experience still in development and not yet available in ChatGPT.”

[Read: This is what it looks like when AI eats the world]

One way to do that could be to apply external programs that filter and check the AI model’s citations, especially given language models’ inherent limitations. ChatGPT may not be great at citing its sources right now, but OpenAI could build a specialized product that is far better. Another tactic might be to specifically prompt and train AI models to provide more reliable annotations; a chatbot could “learn” that a high-quality response includes citations for each line delivered, for example. “There are potential engineering solutions to some of these problems, but solving all of them in one fell swoop is always hard,” Neubig said. Alex Dimakis, a computer scientist at the University of Texas at Austin and a co-director of the National Science Foundation’s Institute for Foundations of Machine Learning, told me over email that it is “certainly possible” that reliable responses with citations could be engineered “soon.”

Still, some of the problems may be inherent to the setup: Reliable summary and attribution require adhering closely to sources, but the magic of generative AI is that it synthesizes and associates information in unexpected ways. A good chatbot and a good web index, in other words, could be fundamentally at odds—media companies might be asking OpenAI to build a product that sacrifices “intelligence” for fidelity. “What we want to do with the generation goes against that attribution-and-provenance part, so you have to make a choice,” Chirag Shah, an AI and internet-search expert at the University of Washington, told me. There has to be a compromise. Which is, of course, what these media partnerships have been all along—tech companies paying to preempt legal battles and bad PR, media companies hedging their bets against a future technology that could ruin their current business model.

Academic and corporate research on making more reliable AI systems that don’t destroy the media ecosystem or poison the web abounds. Just last Friday, OpenAI acquired a start-up that builds information-retrieval software. But absent more details from the company about what exactly these future search products or ChatGPT abilities will look like, the internet’s billions of users are left with the company’s word—no sources cited.