The cost of training top-performing machine learning models has grown beyond the reach of smaller companies. That may mean less innovation all around.
What’s new: Some companies that would like to build a business on state-of-the-art models are settling for less, Wired reported. They’re exploring paths toward higher performance at a lower price.
How it works: Models are getting larger, and with them, the amount of computation necessary to train them. The cost makes it hard to take advantage of the latest advances.
- Glean, which provides tools for searching workplace chat, sales, and other apps, doesn’t have the money to train large language models that would improve its products. Instead, it has turned to smaller, less capable models, software engineer Calvin Qi told Wired.
- Optum, a health benefits provider, spends upward of $50,000 per model for training in the cloud. It’s considering purchasing specialized hardware to speed up the process, according to Dan McCreary, a distinguished engineer at the company.
- Matroid, which offers a computer vision platform, uses its own GPUs supplemented by cloud computing to train transformers for “under $100,000 for the largest models,” founder Reza Zadeh told The Batch. At inference, it cuts compute costs via parameter pruning, quantization, low-rank factorizations, and knowledge distillation.
- Mosaic ML is a startup working on techniques to make training more efficient. Its executive team includes Michael Carbin and Jonathan Frankel. They formulated the “lottery ticket hypothesis,” which posits that only a portion of a neural network is responsible for much of its performance.
Behind the news: In 2020, researchers estimated the cost of training a model of 1.5 billion parameters (the size of OpenAI’s GPT-2) on the Wikipedia and Book corpora at $1.6 million. They gauged the cost to train Google’s Text-to-Text Transformer (T5), which encompasses 11 billion parameters, at $10 million. Since then, Google has proposed Switch Transformer, which scales the parameter count to 1 trillion — no word yet on the training cost.
Why it matters: The growing importance of AI coupled with the rising cost of training large models cuts into a powerful competitive advantage of smaller companies: Their ability to innovate without being weighed down by bureaucratic overhead. This doesn't just hurt their economic prospects, it slows down the emergence of ideas that improve people’s lives and deprives the AI community of research contributions by small players.
We’re thinking: A much bigger model often can perform much better on tasks in which the data has a long tail and the market supports only one winner. But in some applications — say, recognizing cats in photos — bigger models deliver diminishing returns, and even wealthy leaders won’t be able to stay far ahead of competitors.