TinyML shows promise for bringing deep learning to applications where electrical power is scarce, processing in the cloud is impractical, and/or data privacy is paramount. The trick is to get high-performance algorithms to run on hardware that offers limited computation, memory, and electrical power.
What's new: Michael Bechtel, QiTao Weng, and Heechul Yun at University of Kansas built a neural network that steered DeepPicarMicro, a radio-controlled car outfitted for autonomous driving, around a simple track. This work extends earlier work in which the authors built neural networks for extremely limited hardware.
Key insight: A neural network that controls a model car needs to be small enough to fit on a microcontroller, fast enough to recognize the car’s surroundings while it’s in motion, and accurate enough to avoid crashing. One way to design a network that fits all three criteria is to (i) build a wide variety of architectures within the constraints of size and latency and (ii) test their accuracy empirically.
How it works: The hardware included a NewBright 1:24-scale car with battery pack and motor driver, Raspberry Pi Pico microcontroller, and Arducam Mini 2MP Plus camera. The model was based on PilotNet, a convolutional neural network. The authors built a dataset by manually driving the car around a wide, circular track to collect 10,000 images and associated steering inputs.
- The system’s theoretical processing speed was limited by the camera, which captured an image every 133 milliseconds. To match the neural network’s inference latency to that rate, the authors ran 50 neural networks of different sizes and measured their latency. Fitting a linear regression model to the latency and number of multiply-add operations a given network performed revealed that the number of multiply-add operations predicted execution speed almost perfectly. The magic number: 470,000.
- The authors conducted a grid search of around 350 PilotNet variations that contained different layer widths and depths within the allowed number of multiply-adds. They trained each network and tested its accuracy.
Results: The authors selected 16 models with various losses and latencies and tested them on the track. The best model completed seven laps before crashing. (Seven models failed to complete a single lap.) The models that managed at least one lap tended to achieve greater than 80 percent accuracy on the test set and latency lower than 100 milliseconds.
Why it matters: This work shows neural networks, properly designed, can achieve useful results on severely constrained hardware. For a rough comparison, the Nvidia Tegra X2 processor that drives a Skydio 2+ drone provides four cores that run at 2 gigaHertz, while the Raspberry Pi Pico’s processor provides two cores running at 133 megaHertz. Neural networks that run on extremely low-cost, low-power hardware could lead to effective devices that monitor environmental conditions, health of agricultural crops, operation of remote equipment like wind turbines, and much more.
We’re thinking: Training a small network to deliver good performance is more difficult than training a larger one. New methods will be necessary to narrow the gap.