A neural network wrote the blueprint for upcoming computer chips that will accelerate deep learning itself.
What’s new: Google engineers used a reinforcement learning system to arrange the billions of minuscule transistors in an upcoming version of its Tensor Processing Unit (TPU) chips optimized for computing neural networks. The system generated the design in six hours rather than the usual span of weeks, as detailed in Nature.
Key insight: Designing a chip is like playing a board game. A silicon wafer’s area resembles a board, parameters like macro counts and netlist topologies resemble pieces, and evaluation metrics resemble victory conditions. Reinforcement learning (RL) excels at meeting such challenges: Think of DeepMind’s AlphaGo — the RL model that, in 2015, became the first computer program to beat a Go master on a full-size board without a handicap.
How it works: Google introduced its approach in a paper published last year.
- The authors pretrained a graph neural network for 48 hours on a dataset of 10,000 chip designs, generating transferrable representations of chips.
- Although the pretraining was supervised, the loss function was based on RL. The input was the state associated with a given design, and the label was the reward for reduced wire length and congestion.
- They fine-tuned the system for 6 hours using reinforcement learning.
Results: The researchers compared their system’s output to that of a human team who had designed an existing TPU. Their approach completed the task in a fraction of the time, and it either matched or outperformed the human team with respect to chip area, wire length, and power consumption.
Behind the news: Google introduced the first TPU in 2015, and today the chips power Google services like search and translation and are available to developers via Google Cloud. Launched last month, the fourth-generation TPU can train a ResNet-50 on ImageNet in 1.82 minutes.
Why it matters: AI-powered chip design could cut the cost of bespoke chips, leading to an explosion of special-purpose processing for all kinds of uses.
We’re thinking: Reinforcement learning is hot, and we’ve seen companies announce “RL” results that would be described more accurately as supervised learning. But this appears to be a genuine use of RL ideas, and it’s great to see this much-hyped approach used in a valuable commercial application.