From server to smartphone, devices with less processing speed and memory require smaller networks. Instead of building and training separate models to run on a variety of hardware, a new approach trains a single network that can be adapted to any device.
What’s new: Han Cai and researchers at MIT developed Once-for-All (OFA). This method trains a single large model and derives subnetworks — subsets of the original model’s weights — that perform well on less powerful processors.
Key insight: Typical pruning methods downsize neural networks one at a time by reducing, say, the size and number of convolutional filters and then fine-tuning the smaller model. It’s more efficient to extract and fine-tune a fleet of progressively smaller models in a single process.
How it works: OFA extracts subnetworks by varying the parent network’s number of layers, number of filters per layer, filter sizes, and the input resolution. The researchers constrained each of these factors to a predetermined set of values that allow up to 1019 possible subnetworks.
- OFA trains the original network, then randomly samples a slightly smaller version. Then it fine-tunes both.
- It repeats this procedure with ever smaller subnetworks until it arrives at the smallest allowable version.
- OFA randomly samples and evaluates 10,000 subnetworks. The results constitute a dataset that represents model performance at a given size.
- Using the new dataset, OFA trains another network to predict the accuracy of any subnetwork, so it can select the best network of a given size.
Results: The authors compared OFA with a variety of neural architecture search methods suitable for finding models for mobile devices. The popular NASNet-A required 48,000 hours to generate the smallest model, and it would require that time again to generate another one optimized for different constraints. OFA’s baseline model required 1,200 hours to find all models. They also compared OFA to MobileNetV3-Large, the state-of-the-art image recognition network for mobile devices. The OFA model that ran on similar hardware achieved 76.9 percent top-one accuracy on ImageNet compared to MobileNetV3’s 75.2 percent. The most accurate neural search method the researchers considered, FBNet-C, required roughly half as much time as OFA to generate a single, less accurate model, but much more time to generate the second.
Why it matters: OFA produces equivalent models of many sizes in slightly more time than it takes to train the original large models. In situations that require deploying a given network to heterogeneous devices, this efficiency can translate into big savings in development time and energy consumption.
We’re thinking: Smart speakers, watches, thermostats, pacemakers — it’s inevitable that neural networks will run on more and more heterogenous hardware. This work is an early step toward tools to manage such diverse deployments.