Tracking the Elusive Stop Sign How Tesla trained its cars to recognize stop signs

Published

Apr 29, 2020

Reading time

2 min read

Recognizing stop signs, with their bold color scheme and distinctive shape, ought to be easy for computer vision — but it turns out to be a tricky problem. Tesla pulled back the curtain on what it takes to train its self-driving software to perform this task and others.

What’s new: Tesla AI chief Andrej Karpathy describes in a video presentation how the electric car maker is moving toward fully autonomous vehicles. Shot at February’s ScaledML Conference, the video was posted on YouTube last week.

Not just a big red hexagon: Stop signs take a surprising variety of forms and appearances, and that can make them hard to identify. Rather than an oversized icon on a pole, they’re often waved by construction workers, hanging off school buses, or paired with other signs. Karpathy describes how his team trained the company’s Autopilot system to detect a particularly vexing case: stop signs partially obscured by foliage.

Engineers understood that AutoPilot was having trouble recognizing occluded stop signs because, among other things, the bounding boxes around them flickered.
Using images from the existing dataset, they trained a model to detect occluded stop signs. They sent this model to the fleet with instructions to send back similar images. This gave them tens of thousands of new examples.
They used the new examples to improve the model’s accuracy. Then they deployed it to HydraNet, the software that fuses outputs from AutoPilot’s 48 neural networks into a unified, labeled field of view.

Behind the news: Tesla is the only major autonomous driving company that doesn’t use lidar as its primary sensor. Instead, it relies on computer vision with help from radar and ultrasonic sensors. Cameras are relatively cheap, so Tesla can afford to install its self-driving sensor package into every car that comes off the line, even though self-driving software is still in the works. It’s also easier to label pictures than point clouds. The downside: Cameras need a lot of training to sense the world in three dimensions, which lidar units do right out of the box.

Unstoppable: Responding to Karpathy’s presentation on Twitter, Google Brain researcher David Ha (@hardmaru) created stop sign doodles using sketch-rnn, an image generator he and colleague David Eck trained on crude hand-drawn sketches. Generate your own doodled dataset here.

Subscribe to The Batch