Ideally, real-time 3D applications such as virtual and augmented reality transition smoothly between different viewpoints of a scene — but generating a fresh perspective can take time. New research speeds the process.
What’s new: Stephan Garbin and colleagues at Microsoft developed FastNeRF, a system that accelerates the photorealistic 3D rendering method known as Neural Radiance Fields (NeRF) to visualize scenes from any angle at a brisk 200 frames per second.
Key insight: To visualize one frame of a 3D scene, you need to know the position of a virtual camera and the directions of a set of virtual light rays that extend from the camera through each pixel in the frame. (The objects behind the pixels have a basic color that may be modified by lights, shadows, occlusion, and transparency.) NeRF computes a pixel’s color by combining the color/transparency of all points that lie along the associated ray, which requires hundreds of neural network inferences — tough to pull off in real time. FastNeRF manages the computational burden through a two-part workaround. First, rather than calculating on the fly, it pre-computes and stores information about all possible rays and points along them. Second, to avoid having to store every possible combination of ray and point (1,024^3 * 1,024^2 values, assuming 1,024 samples per spatial dimension), it stores each point’s basic color and transparency based on its position, and the shift in its color due to a ray’s direction (1,024^3 + 1,024^2 values).
How it works: FastNeRF uses two vanilla neural networks to compute information based on a point’s position (the position network) and a ray’s direction (direction network). The authors trained the system on Synthetic NeRF, which contains 360-degree views of real-world objects like model ships and LEGO constructions, and frontal views of objects in Local Light Field Fusion.
- FastNeRF evenly samples points throughout the scene. The position network calculates each point’s transparency as well as a vector that represents its basic color. It stores the results.
- Similarly, FastNeRF evenly samples rays pointing in all directions. The direction network calculates a vector that represents how each ray’s direction would affect the color of all points along that ray. It stores that result as well.
- To compute a pixel’s value, FastNeRF combines the transparency, basic color, and the effect of the ray’s direction for every point along the ray.
- It weights each point’s color (from the location network) by the output of the direction network. Then it weights each point’s color by its transparency. Finally, it sums the twice-weighted color of all points along the ray.
Results: Running on a high-end consumer graphs board, FastNeRF performed over 3,000 times faster than NeRF. For example, it rendered a scene of a LEGO tractor in 0.0056 seconds versus NeRF’s 17.46 seconds. Despite its speed, on Synthetic NeRF, FastNeRF achieved 29.97dB peak signal-to-noise ratio, which gauges how well a generated image reproduces the original (higher is better), versus NeRF’s 29.54dB.
Why it matters: The authors reduced an unmanageable quantity of high-dimensional data to a practical size by dividing the information based on point position and ray direction between two models. A similar approach could be useful in applications that require optimization over many input parameters, such as drug discovery and weather modeling.
We’re thinking: Augmented and virtual reality promise to bring powerful new approaches in education, entertainment, and industry — if we can make them cheap, easy, and fast enough. Deep learning is helping us get there.