Next-Gen Models Show Limited Gains AI giants rethink model training strategy as scaling laws break down

Published
Reading time
4 min read
Graph showing test loss decreases with more tokens and larger model sizes (103-109 parameters).

Builders of large AI models have relied on the idea that bigger neural networks trained on more data and given more processing power would show steady improvements. Recent developments are challenging that idea.

What’s new: Next-generation large language models from OpenAI, Google, and Anthropic are falling short of expectations, employees at those companies told multiple publications. All three companies are responding by shifting their focus from pretraining to enhancing performance through techniques like fine-tuning and multi-step inference.

Scaling law basics: A classic 2020 paper shows that, assuming a sufficient quantity of data, a transformer network’s performance rises predictably with increases in model size (demonstrated between 768 parameters and 1.5 billion parameters). Likewise, assuming sufficient model size, performance rises predictably with increases in dataset size (demonstrated between 22 million tokens and 23 billion tokens). Furthermore, performance rises predictably with increases in both model and dataset sizes. The 2022 Chinchilla paper shows that, to build an optimal model, every 4x increase in compute requires a 2x increase in the size of the model and dataset (demonstrated for models between 70 million and 16 billion parameters, trained on between 5 billion and 500 billion tokens). Due to limited experimentation and lack of a theoretical basis of their findings, the authors didn’t determine whether these relationships would continue to hold at larger scales.

Diminishing returns: Major AI companies have been counting on scaling laws to keep their models growing more capable at a steady pace. However, the next generation of high-profile models has not shown the expected improvements despite larger architectures, more training data, and more processing power.

  • One-quarter of the way through its training, performance of OpenAI’s next-generation model Orion was on par with GPT-4’s, anonymous staffers told reporters. But after training was finished, Orion’s improvement over GPT-4 was far smaller than that from GPT-3 to GPT-4. OpenAI’s o1 model, which is based on GPT-4o, delivers improved performance by using additional processing during inference. The company currently expects to introduce Orion early next year.
  • Google has faced similar challenges in developing the next version of Gemini. Employees who declined to be named said the development effort had shown disappointing results and slower-than-expected improvement despite training on larger amounts of data and processing power. Like OpenAI, Google is exploring alternative ways to boost performance, the sources said. The company expects to introduce the model in December.
  • Anthropic’s schedule for introducing Claude 3.5 Opus, the largest member of its Claude 3.5 family, has slipped. It hasn’t shown the expected performance given its size and cost, according to anonymous sources inside the company. Anthropic aims to improve performance by developing agentic capabilities and application-specific performance.
  • One clear limitation in realizing the performance gains predicted by scaling laws is the amount of data available for training. Current models learn from huge amounts of data scraped from the web. It’s getting harder to find high-quality materials on the web that haven’t already been tapped, and other large-scale data sources aren’t readily available. Some model builders are supplementing real-world data with synthetic data, but Google and OpenAI have been disappointed with the results of pretraining models on synthetic data. OpenAI found that pretraining Orion on synthetic data made it too much like earlier models, according to anonymous employees. 

What they’re saying: AI leaders are divided on the future of scaling laws as they are currently understood. 

  • “We don’t see any evidence that things are leveling off. The reality of the world we live in is that it could stop at any time. Every time we train a new model, I look at it and I’m always wondering — I’m never sure in relief or concern — [if] at some point we’ll see, oh man, the model doesn’t get any better.” — Dario Amodei, CEO and co-founder, Anthropic
  • “There is no wall.” — Sam Altman, CEO and co-founder, OpenAI
  • “The 2010s were the age of scaling, now we're back in the age of wonder and discovery once again. . . . Scaling the right thing matters now more than ever.” — Ilya Sutskever, co-founder of OpenAI who now leads Safe Superintelligence, an independent research lab

Why it matters: AI’s phenomenal advance has drawn hundreds of millions of users and sparked a new era of progress and hope. Slower-than-expected improvements in future foundation models may blunt this progress. At the same time, the cost of training large AI models is rising dramatically. The latest models cost as much as $100 million to train, and this number could reach $100 billion within a few years, according to Anthropic’s Dario Amodei. Rising costs could lead companies to reallocate their gargantuan training budgets and researchers to focus on more cost-effective, application-specific approaches.

We’re thinking: AI’s power-law curves may be flattening, but we don’t see overall progress slowing. Many developers already have shifted to building smaller, more processing-efficient models, especially networks that can run on edge devices. Agentic workflows are taking off and bringing huge gains in performance. Training on synthetic data is another frontier that’s only beginning to be explored. AI technology holds many wonders to come!

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox