Artificial Intelligence and machine learning advancements have allowed researchers to build detailed 3D models of real-world locations by using the reference data of thousands of tourists’ photos. The finished models have cleanly removed unwanted objects and even normalized lighting conditions.
The project and associated paper are titled Neural Radiance Fields for Unconstrained Photo Collections and was originally published in August of 2020. The project was recently updated with even more examples of its application, a deep-dive video explanation of how the program works, and published findings that take the idea of converting 2D to 3D a step further.
To recap, researchers used a photo tourism data set of thousands of images to produce highly-detailed models of iconic locations.
“You can see that we are able to produce high-quality renderings of novel views of these scenes using only unstructured image collections as input,” the researchers say.
“Getting good results from uncontrolled internet photos can a challenging task because these images have likely been taken at different times, ” the researchers explain. “So the weather might change or the sun might move. They can have different types of post-processing applied to them. Also, people generally don’t take photos of landmarks in isolation: there might be people posing for the camera, or pedestrians or cars moving through the scene.”
The project is a learning-based method for synthesizing views of complex scenes using only in-the-wild photographs. The researchers — Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, Daniel Duckworth — built on the Neural Radiance Fields (NeRF), which uses different perspective data to model the density and color of a scene as a function of 3D coordinates.
The original NeRF program was unable to model images based on real-world situations or uncontrolled image sets, as it was only designed originally for controlled settings. These researchers decided to tackle that particular weakness and enabled accurate reconstructions from completely unconstrained image collections taken from the internet.
“We apply our system, dubbed NeRF-W, to internet photo collections of famous landmarks, and demonstrate temporally consistent novel view renderings that are significantly closer to photorealism than the prior state of the art,” the team writes.
“NeRF-W captures lighting and photometric post-processing in a low-dimensional latent embedding space. Interpolating between two embeddings smoothly captures variation in appearance without affecting 3D geometry,” the team explains. “NeRF-W disentangles lighting from the underlying 3D scene geometry. The latter remains consistent even as the former changes.”
The advancements on this from the original have resulted in much better, less noise-filled 3D models that are far superior to the original Neural Renderings in the Wild from last year. Below are a couple of still capture examples, but the benefits of this latest advancement are clearer when seen in motion via the video above.
The video explanation of how this program works is fascinating to anyone working in the advancement of artificial intelligence. What these researchers have done is extremely impressive, and it will be interesting to see what possible applications of this technology come in the future. You can learn more about the project and technology here.