OpenAI’s Big Reset

With its new model, the company wants you to think ChatGPT is human.

Latest Sep 13, 2024 0 Add to Reading List

After weeks of speculation about a new and more powerful AI product in the works, OpenAI today announced its first “reasoning model.” The program, known as o1, may in many respects be OpenAI’s most powerful AI offering yet, with problem-solving capacities that resemble those of a human mind more than any software before. Or, at least, that’s how the company is selling it.

As with most OpenAI research and product announcements, o1 is, for now, somewhat of a tease. The start-up claims that the model is far better at complex tasks but released very few details about the model’s training. And o1 is currently available only as a limited preview to paid ChatGPT users and select programmers. All that the general public has to go off of is a grand pronouncement: OpenAI believes it has figured out how to build software so powerful that it will soon think “similarly to PhD students” in physics, chemistry, and biology tasks. The advance is supposedly so significant that the company says it is starting afresh from the current GPT-4 model, “resetting the counter back to 1” and even forgoing the familiar “GPT” branding that has so far defined its chatbot, if not the entire generative AI boom.

The research and blog posts that OpenAI published today are filled with genuinely impressive examples of the chatbot “reasoning” through difficult tasks: advanced math and coding problems; decryption of an involved cipher; complex questions about genetics, economics, and quantum physics from experts in those fields. Plenty of charts show that, during internal evaluations, o1 has leapfrogged the company’s most advanced language model, GPT-4o, on problems in coding, math, and various scientific fields.

The key to these advances is a lesson taught to most children: Think before you speak. OpenAI designed o1 to take a longer time “thinking through problems before they respond, much like a person would,” according to today’s announcement. The company has dubbed that internal deliberation a “chain of thought,” a long-standing term used by AI researchers to describe programs that break problems into intermediate steps. That chain of thought, in turn, allows the model to solve smaller tasks, correct itself, and refine its approach. When I asked the o1 preview questions today, it displayed the word “Thinking” after I sent various prompts, and then it displayed messages related to the steps in its reasoning—“Tracing historical shifts” or “Piecing together evidence,” for example. Then, it noted that it “Thought for 9 seconds,” or some similarly brief period, before providing a final answer.

The full “chain of thought” that o1 uses to arrive at any given answer is hidden from users, sacrificing transparency for a cleaner experience—you still won’t actually have detailed insight into how the model determines the answer it ultimately displays. This also serves to keep the model’s inner workings away from competitors. OpenAI has said almost nothing about how o1 was built, only telling The Verge that it was trained with a “completely new optimization algorithm and a new training dataset.” A spokesperson for OpenAI did not immediately respond to a request for comment this afternoon.

Despite OpenAI’s marketing, then, it is unclear that o1 will provide a massively new experience in ChatGPT so much as an incremental improvement over previous models. But based on the research presented by the company and my own limited testing, it does seem like the outputs are at least somewhat more thorough and reasoned than before, reflecting OpenAI’s bet on scale: that bigger AI programs, fed more data and built and run with more computing power, will be better. The more time the company used to train o1, and the more time o1 was given to respond to a question, the better it performed.

One result of this lengthy rumination is cost. OpenAI allows programmers to pay to use its technology in their tools, and every word the o1 preview outputs is roughly four times more expensive than for GPT-4o. The advanced computer chips, electricity, and cooling systems powering generative AI are incredibly expensive. The technology is on track to require trillions of dollars of investment from Big Tech, energy companies, and other industries, a spending boom that has some worried that AI might be a bubble akin to crypto or the dot-com era. Expressly designed to require more time, o1 necessarily consumes more resources—in turn raising the stakes of how soon generative AI can be profitable, if ever.

Perhaps the most important consequence of these longer processing times is not technical or financial costs so much as a matter of branding. “Reasoning” models with “chains of thought” that need “more time” do not sound like stuff of computer-science labs, unlike the esoteric language of “transformers” and “diffusion” used for text and image models before. Instead, OpenAI is communicating, plainly and forcefully, a claim to have built software that more closely approximates our minds. Many rivals have taken this tack as well. The start-up Anthropic has described its leading model, Claude, as having “character” and a “mind”; Google touts its AI’s “reasoning” capabilities; the AI-search start-up Perplexity says its product “understands you.” According to OpenAI’s blogs, o1 solves problems “similar to how a human may think,” works “like a real software engineer,” and reasons “much like a person.” The start-up’s research lead told The Verge that “there are ways in which it feels more human than prior models,” but also insisted that OpenAI doesn’t believe in equating its products to our brains.

The language of humanity might be especially useful for an industry that can’t quite pinpoint what it is selling. Intelligence is capacious and notoriously ill-defined, and the value of a model of “language” is fuzzy at best. The name “GPT” doesn’t really communicate anything at all, and although Bob McGrew, the company’s chief research officer, told The Verge that o1 is a “first step of newer, more sane names that better convey what we’re doing,” the distinction between a capitalized acronym and a lowercase letter and number will be lost on many.

But to sell human reasoning—a tool that thinks like you, alongside you—is different, the stuff of literature instead of a lab. The language is not, of course, clearer than any other AI terminology, and if anything is less precise: Every brain and the mind it supports are entirely different, and broadly likening AI to a human may evince a misunderstanding of humanism. Maybe that indeterminacy is the allure: To say an AI model “thinks” like a person creates a gap that every one of us can fill in, an invitation to imagine a computer that operates like me. Perhaps the trick to selling generative AI is in letting potential customers conjure all the magic themselves.