Virtual Reality in Real Time FastNeRF renders 3D scenes at 200 frames per second.

Published

May 19, 2021

Reading time

2 min read

Ideally, real-time 3D applications such as virtual and augmented reality transition smoothly between different viewpoints of a scene — but generating a fresh perspective can take time. New research speeds the process.

What’s new: Stephan Garbin and colleagues at Microsoft developed FastNeRF, a system that accelerates the photorealistic 3D rendering method known as Neural Radiance Fields (NeRF) to visualize scenes from any angle at a brisk 200 frames per second.

Key insight: To visualize one frame of a 3D scene, you need to know the position of a virtual camera and the directions of a set of virtual light rays that extend from the camera through each pixel in the frame. (The objects behind the pixels have a basic color that may be modified by lights, shadows, occlusion, and transparency.) NeRF computes a pixel’s color by combining the color/transparency of all points that lie along the associated ray, which requires hundreds of neural network inferences — tough to pull off in real time. FastNeRF manages the computational burden through a two-part workaround. First, rather than calculating on the fly, it pre-computes and stores information about all possible rays and points along them. Second, to avoid having to store every possible combination of ray and point (1,024^3 * 1,024^2 values, assuming 1,024 samples per spatial dimension), it stores each point’s basic color and transparency based on its position, and the shift in its color due to a ray’s direction (1,024^3 + 1,024^2 values).

How it works: FastNeRF uses two vanilla neural networks to compute information based on a point’s position (the position network) and a ray’s direction (direction network). The authors trained the system on Synthetic NeRF, which contains 360-degree views of real-world objects like model ships and LEGO constructions, and frontal views of objects in Local Light Field Fusion.

FastNeRF evenly samples points throughout the scene. The position network calculates each point’s transparency as well as a vector that represents its basic color. It stores the results.
Similarly, FastNeRF evenly samples rays pointing in all directions. The direction network calculates a vector that represents how each ray’s direction would affect the color of all points along that ray. It stores that result as well.
To compute a pixel’s value, FastNeRF combines the transparency, basic color, and the effect of the ray’s direction for every point along the ray.
It weights each point’s color (from the location network) by the output of the direction network. Then it weights each point’s color by its transparency. Finally, it sums the twice-weighted color of all points along the ray.

Results: Running on a high-end consumer graphs board, FastNeRF performed over 3,000 times faster than NeRF. For example, it rendered a scene of a LEGO tractor in 0.0056 seconds versus NeRF’s 17.46 seconds. Despite its speed, on Synthetic NeRF, FastNeRF achieved 29.97dB peak signal-to-noise ratio, which gauges how well a generated image reproduces the original (higher is better), versus NeRF’s 29.54dB.

Why it matters: The authors reduced an unmanageable quantity of high-dimensional data to a practical size by dividing the information based on point position and ray direction between two models. A similar approach could be useful in applications that require optimization over many input parameters, such as drug discovery and weather modeling.

We’re thinking: Augmented and virtual reality promise to bring powerful new approaches in education, entertainment, and industry — if we can make them cheap, easy, and fast enough. Deep learning is helping us get there.

Subscribe to The Batch