A new computing cluster delivers more bang per chip.
What’s new: Cerebras, one of several startups vying to supply the market for specialized AI chips, unveiled Andromeda, a supercomputer based on its processors. Unlike conventional clusters, which incur data bottlenecks as processors are added, the system’s processing speed rises linearly with additional processors.
How it works: Andromeda comprises 16 Cerebras CS-2 Wafer Scale Engine chips. Each chip holds 850,000 processing cores (more than 100 times the number found on an Nvidia A100) on a silicon disc that measures 21.5 centimeters across.
- The cluster can execute more than 1 exascale floating point operation per second, which is comparable to the world’s fastest supercomputer, Oak Ridge National Laboratory’s Frontier.
- A memory extension called MemoryX stores model weights off-system and streams them to the processors as needed.
- Up to 16 users can access Andromeda simultaneously, and they can specify how many of the system’s 16 processors they wish to use.
- Several companies are using Andromeda for research including rival chip designer AMD and natural language processing startup Jasper AI.
Speed tests: Scientists at Argonne National Laboratory used the system to train GenSLM language models in several sizes. Increasing the number of processors from one to four boosted throughput nearly linearly while training models of 123 million parameters and 1.3 billion parameters. Going from one to four chips also cut the smaller model’s training time from 4.1 to 2.4 hours and cut the larger model’s training time to 15.6 to 10.4 hours.
Behind the news: As interest rates rise, AI chip startups are facing headwinds in raising enough capital to support their often huge expenses.
- Texas-based Mythic, which focused on analog chips for AI applications, ran out of money earlier this month.
- Graphcore, based in the UK, lost $1 billion value in October after Microsoft canceled a lucrative deal.
- Also in October, Israeli chip designer Habana Labs, which Intel acquired in 2019, laid off 10 percent of its workforce.
Why it matters: Neural networks have breached the 1 trillion-parameters mark, and numbers one or two orders of magnitude greater may be close at hand. More efficient compute clusters could train those models more quickly and consume less energy doing it.
We’re thinking: For most current machine learning models, the usual GPUs should be fine. Cerebras specializes in models and compute loads too large for a handful of GPUs in a single server — an interesting business as model sizes balloon.