NVIDIA’s New Tech Can Turn A Set of Photos into 3D Scenes in Seconds

Mar 25, 2022

Jaron Schneider

NVIDIA Instant NeRF

NVIDIA’s Instant NeRF is a neural rendering model that can produce a 3D scene from 2D data inputs in seconds and can render images of that scene in milliseconds.

The process is known as inverse rendering and allows AI to approximate how light behaves in the real world, which can be used to turn a collection of still images into a digital 3D scene in seconds. NVIDIA’s research team has developed an approach that accomplishes the task extremely rapidly — almost instantly — which makes it one of the first models of its kind that can combine ultra-fast neural network training and rapid rendering.

What is a NeRF?

The name “NeRF” stands for neural radiance fields, or a method for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views.

NVIDIA simplifies this explanation and says that NeRFs use neural networks to represent and render 3D scenes based on an input collection of 2D images. The neural network requires a few dozen images taken from multiple positions around the scene as well as the camera’s position of each of those shots.

“In a scene that includes people or other moving elements, the quicker these shots are captured, the better. If there’s too much motion during the 2D image capture process, the AI-generated 3D scene will be blurry,” NVIDIA says.

With that information, the NeRF basically fills in the blanks to generate the full scene by predicting color of light radiating in any direction from any point in the 3D space. NVIDIA’s version works so fast it’s nearly instantaneous, hence its name, and is the fastest NeRF technique to date since it can render a resulting 3D scene in a matter of milliseconds after provided the correct inputs.

“If traditional 3D representations like polygonal meshes are akin to vector images, NeRFs are like bitmap images: they densely capture the way light radiates from an object or within a scene,” David Luebke, vice president for graphics research at NVIDIA, says. “In that sense, Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography — vastly increasing the speed, ease and reach of 3D capture and sharing.”

NVIDIA says that Instant NeRF can be used to create avatars or even full scenes for virtual worlds. To pay a tribute to the early days of Polaroid images, the NVIDIA Research team recreated an iconic photo of Andy Warhol taking an instant photo and turned it into a 3D scene using Instant NeRF.

1,000 Times Faster

Prior to NeRF, creating a 3D scene took hours depending on the complexity and resolution. Adding AI into the equation certainly sped things up, but it still could take hours to properly train.

Instant NeRF cuts down render times by a factor of 1,000 by using a technique developed by the NVIDIA called multi-resolution has grid encoding. The model was developed using the NVIDIA CUDA Toolkit and the Tiny CUDA Neural Networks library. NVIDIA explains that since it’s a lightweight neural network, it can be trained and run on a single NVIDIA GPU (those running the fastest on cards with NVIDIA Tensor Cores).

“The technology could be used to train robots and self-driving cars to understand the size and shape of real-world objects by capturing 2D images or video footage of them,” NVIDIA explains. “It could also be used in architecture and entertainment to rapidly generate digital representations of real environments that creators can modify and build on.”