AI Creates Amazing Virtual San Francisco Using Millions of Photos

Apr 04, 2022

Jaron Schneider

Recent developments in artificial intelligence (AI) allow 3D scenes to be generated from just a handful of photos, but Waymo researchers pushed the idea to the extremes and have successfully recreated a 3D model of San Francisco from 2.8 million input photos.

Neural Radiance Fields

Neural radiance filds (NeRF) have been used to synthesize views of complex scenes by taking a few input images and using an AI to generate the gaps. Multiple companies have experimented with the technology, most notably Google researchers — who reconstructed locations using a batch of photos from tourists — and NVIDIA — which recently debuted a neural rendering model that can produce 3D scenes from a small number of photos.

But researchers from Waymo — formerly Google’s self-driving car division — decided to take the concept to the next level. The team says that while they found the current applications of NeRF interesting, they were not particularly useful as they only recreated a single location.

“Recent advancements in neural rendering such as Neural Radiance Fields have enabled photo-realistic reconstruction and novel view synthesis given a set of posed camera images,” the team says in its research paper.”Earlier works tended to focus on small-scale and object-centric reconstruction.”

Block-NeRF

The team created what it calls Block-NeRF — a variant of Neural Radiance Fields that can represent big environments — to make large-scale scene reconstructions by using multiple NeRFs and combining the renderings together. The researchers were able to use a huge amount of visual data that was captured by Waymo’s self-driving cars that have been tested through San Francisco.

“We built a grid of Block-NeRFs from 2.8 million images to create the largest neural scene representation to date, capable of rendering an entire neighborhood of San Francisco,” the team says.

The researchers took the Alamo neighborhood — which is approximately a half square kilometer in size — and rendered it in 3D using 35 Block-NeRFs, one for each block. To create the render, Waymo researchers used 2.8 million photos that were captured over a three-month period. On the research website, the team recreated more than just the Alamo Square block and has 3D recreations of Lombard Street, the Bay Bridge and the Embarcadero area, the Moscone Center, and the downtown area. Each of these renders took its own set of millions of images to create.

Building a 3D environment this large is not without its challenges. The presence of moving objects like cars and people, limitations on the capacity of individual NeRF models, and issues with memory and computing constraints makes such a digital recreation difficult. The researchers tackled these issues by breaking up the large environments into individually trained “blocks” (hence the name Block-NeRF) which are then dynamically combined. According to the researchers, this method provides enough flexibility and scalability to enable the recreation of large environments.

The Waymo researchers say that reconstructing large-scale environments enables several important use-cases, specifically for autonomous driving and aerial surveying.

An advantage of the Block-NeRF approach is that once the 3D environment has been created, the virtual space is not confined to the path that the Waymo self-driving cars traversed and can be fully explored from any angle.

“One example is mapping, where a high-fidelity map of the entire operating domain is created to act as a powerful prior for a variety of problems, including robot localization, navigation, and collision avoidance,” the researchers explain.

The full research paper and several examples of the Block-NeRF’s capabilities can be found on the Waymo research website.