Robots rely on GPS and prior knowledge of the world to move around without bumping into things. Humans don’t communicate with positioning satellites, yet they’ve wandered confidently, if obliviously, for millennia. A new navigation technology mimics that ability to set a course with only visual input.
What’s new: Carnegie Mellon and Facebook AI teams joined to create Active Neural Mapping, a hybrid of classical search methods, which find an intended destination from a starting location, and neural networks. ANM predicts actions for navigating indoor spaces. And it makes cool videos!
Key insight: The classical search algorithm A* theoretically solved the path-finding problem, but it doesn’t generalize efficiently and requires highly structured data. Learning-based methods have proven useful as approximate planners when navigation requires completing subtasks like image recognition, but end-to-end learning has failed at long-term motion planning. These two approaches complement one another, though, and together they can achieve greater success than either one alone.
How it works: ANM has four essential modules. The mapper generates the environment map. The global policy predicts the final position desired. The planner finds a route. And the local policy describes how to act to obey the planner.
- The mapper is a CNN. Given an RGB image of the current view and the viewing direction and angle, it learns a 2D bird’s eye view of the world, showing obstacles and viewable areas. It also estimates its own position on the map.
- The global policy, also a CNN, predicts the final destination on the map based on the mapper’s world view, estimated current position, previously explored areas, and a task. The task isn't a specific destination. It may be something like, Move x meters forward and y meters to the right, or Explore the maximum area in a fixed amount of time.
- The planner uses classical search to find successive locations within 0.25 meters of each other on the way to the global policy’s predicted goal. The researchers use Fast Marching Method, but any classical search algorithm would do.
- The local policy, another CNN, predicts the next action given the current RGB view, the estimated map, and the immediate subgoal.
Why it matters: ANM achieves unprecedented, near-optimal start-to-destination navigation. Navigation through purely visual input can be helpful where GPS is inaccurate or inaccessible, such as indoors. It could also help sightless people steer through unfamiliar buildings with relative ease.
We’re thinking: Neuroscience shows that rats, and presumably humans, hold grid-like visualizations of their environment as they move through it, as brain activity signals expectation of the next location: a subgoal. ANM mirrors that biological path-planning process, though it wasn’t the researchers’ agenda.