NVIDIA’s New Tool Can Create Editable 3D Objects from Photos

NVIDIA researchers have created a new tool that can easily turn photos into 3D objects. Called NVIDIA 3D MoMa, the company says it can allow architects, designers, concept artists, and game developers to quickly import objects into a graphics engine for digital manipulation.

The technology works through what is known as inverse rendering, which is a technique that can reconstruct a series of still photos into a 3D model of an object or a scene.

David Luebke, vice president of graphics research at NVIDA, describes this idea as a “holy grail” of sorts for computer vision and computer graphics

“By formulating every piece of the inverse rendering problem as a GPU-accelerated differentiable component, the NVIDIA 3D MoMa rendering pipeline uses the machinery of modern AI and the raw computational horsepower of NVIDIA GPUs to quickly produce 3D objects that creators can import, edit and extend without limitation in existing tools,” he says.

3D MoMa takes a series of photos and uses them to create 3D renders of the object that is formed with a triangle mesh with textured materials, what NVIDIA describes as a common language used by 3D tools across various industries.

3D MoMa Works with Industry Standard Tools

Game studios, for example, typically create 3D objects like this with complex photogrammetry techniques that NVIDIA says takes significant time and manual effort. Earlier this year, NVIDIA showed off a way it was able to turn a set of photos into 3D scenes in a matter of seconds, and while it is powerful, it did not create that triangle mesh that would make those captures easily edited.

NVIDIA’s 3D MoMA changes that and generates triangle mesh models within an hour on a single NVIDIA Tensor Core GPU. The company says the output is directly compatible with the 3D graphics engines and modeling tools that creators across a range of industries already use.

NVIDIA MoMa
Triangle meshes are the underlying frames used to define shapes in 3D graphics and modeling.

“The pipeline’s reconstruction includes three features: a 3D mesh model, materials, and lighting. The mesh is like a paper-mâché model of a 3D shape built from triangles,” NVIDIA says. “With it, developers can modify an object to fit their creative vision. Materials are 2D textures overlaid on the 3D meshes like a skin. And NVIDIA 3D MoMa’s estimate of how the scene is lit allows creators to later modify the lighting on the objects.”


The researchers have showcased the tool with a set of jazz instruments in the video above. NVIDIA’s team first captured about 100 images of each of the five jazz band instruments from different angles and then fed that information into the 3D MoMa system. From that input, it was able to generate the required 3D models that could be imported into a 3D editor.

NVIDIA says that these inverse rendered objects can be used as building blocks for a complex animated scene and allows them to be created in a fraction of the time.

NVIDIA has published a paper on the 3D MoMa system which is one of 38 papers NVIDIA authors will present at the Conference on Computer Vision and Pattern Recognition (CVPR) this week.

Discussion