Dear friends,
Will the future of large language models limit users to cutting-edge models from a handful of companies, or will users be able to choose among powerful models from a large number of developers? We’re still early in the development of large language models (LLMs), but I believe that users will have access to models from many companies. This will be good for innovation.
We've seen repeatedly that yesterday’s supercomputer is tomorrow’s pocket watch. Even though training an LLM currently requires massive data and infrastructure, I see encouraging progress toward wider availability and access along three dimensions:
- Open models are gaining traction and delivering solid performance, such as BigScience’s BLOOM, Tsinghua University’s GLM, and Meta’s OPT (released under a restrictive license that welcomes researchers but bars commercial use). Today’s open models aren’t as good as some proprietary models, but they will continue to improve rapidly.
- Researchers are developing techniques to make training more efficient. DeepMind published recommendations for how to train LLMs given a fixed computational budget, leading to significant gains in efficiency. Although it addresses smaller models, cramming improves the performance that can be achieved with one day of training language models on a single GPU. Recent work using eight-bit and even four-bit computation is also pushing the possibilities for inference.
- As more teams develop and publish LLMs, there will be systematic comparisons that empower users to pick the right one based on cost, availability, and other criteria. For example, a team led by Percy Liang carried out an extensive study that compares LLMs. (Skip to the “Our Findings” section if you’re impatient to see their conclusions.)
There were times in my career when I worked with some of the world’s biggest systems dedicated to training deep learning models, but they didn’t last. I had access to massive parallel computing power at Google, and my teams built an early GPU server at Stanford and a high-performance computing system focused on speech recognition. Faster systems soon left those formerly cutting-edge systems in the dust. Even though training an LLM currently requires a daunting amount of computation, I see little reason to believe that it won’t quickly become much easier, particularly given the widespread excitement and massive investment around them.
What does this mean for businesses? Many companies have built valuable and defensible businesses using early innovations in deep learning, and I foresee that similarly valuable and defensible systems will be built using recent innovations in LLMs and, more broadly, generative AI.
I will explore this topic more in future letters. Until then,
Keep learning!
Andrew