An upstart supplier of AI chips secured a major customer.
What’s new: Cerebras, which competes with Nvidia in hardware for training large models, signed a $100 million contract with Abu Dhabi tech conglomerate G42. The deal is the first part of a multi-stage plan to build a network of supercomputers.
How it works: The deal covers the first three of nine proposed systems. The first, Condor Galaxy 1 (CG-1), is already up and running in Santa Clara, California. CG-2 and CG-3 are slated to open in early 2024 in Austin, Texas and Asheville, North Carolina. Cerebras and G42 are in talks to build six more by the end of 2024. G42 plans to use the network to supply processing power primarily to healthcare and energy companies
- The systems are based on Cerebras’ flagship chip, which is designed to overcome communication bottlenecks between separate AI and memory chips by packing computing resources onto a single giant chip. Each chip fills an entire silicon wafer, which is typically divided into smaller chips. It holds 2.6 trillion transistors organized into 850,000 cores, compared to an Nvidia H100 GPU, which has 80 billion transistors and around 19,000 cores.
- CG-1 comprises 32 Cerebras chips (soon to be upgraded to 64), which process AI operations, as well as 82 terabytes of memory. Over 72,700 AMD EPYC cores handle input and output processing.
- Each supercomputer will run at 4 exaflops (4 quintillion floating point operations per second) peak performance. In comparison, Google’s Cloud TPU v4 Pods deliver 1.1 exaflops.
- The architecture enables processing to be distributed among all the chips with minimal loss of efficiency.
Behind the news: Nvidia accounts for 95 percent of the market for GPUs used in machine learning — a formidable competitor to Cerebras and other vendors of AI chips. Despite Nvidia’s position, though, there are signs that it’s not invincible.
- Nvidia has struggled to keep up with the surge in demand brought on by generative AI.
- Google and Amazon design their own AI chips, making them available to customers through their cloud platforms. Meta and Microsoft have announced plans to design their own as well.
Why it matters: The rapid adoption of generative AI is fueling demand for the huge amounts of processing power required to train and run state-of-the-art models. In practical terms, Nvidia is the only supplier of tried-and-true AI chips for large-scale systems. This creates a risk for customers who need access to processing power and an opportunity for competitors who can satisfy some of that demand.
We’re thinking: As great as Nvidia’s products are, a monopoly in AI chips is not in anyone’s best interest. Cerebras offers an alternative for training very large models. Now cloud-computing customers can put it to the test.