Meta’s New Text-to-3D Generator Creates 3D Models in Under a Minute

A collection of detailed figurines displayed on a surface. Items include a cactus, bear, T-Rex, dragon emerging from an egg, lion, raccoon with a pizza, dogs, skull, hippo, duck, croissant, pigeon, turkey, turtle, toy car, crab, elephant, and a bird.

While Meta deals with artificial intelligence in the form of its constantly-changing content tagging system, the company’s research wing is hard at work on novel generative AI technology, including a new Meta 3D Gen platform that delivers text-to-3D asset generation with high-quality geometry and texture.

“This system can generate 3D assets with high-resolution textures & material maps end-to-end with results that are superior in quality to previous state-of-the-art solutions — at 3-10x the speed of previous work,” Meta AI explains on Threads.

Post by @aiatmeta
View on Threads

Meta 3D Gen (3DGen) can create 3D assets and textures from a simple text prompt in under a minute, per Meta’s research paper. This is functionally similar to text-to-image generators like Midjourney and Adobe Firefly, but 3DGen builds fully 3D models with underlying mesh structures that support physically-based rendering (PBR). This means that the 3D models generated by Meta 3DGen can be used in real-world modeling and rendering applications.

A colorful display of various 3D-printed animal figurines and objects. The collection includes lions, raccoons, a pig, birds, a cacti, and several anthropomorphic animals in costumes. Among them are also food items, beetles, and a skull, all arranged on a green surface.
This is a visual comparison of text-to-3D generations following Meta 3D Gen’s stage I (left) and stage II (right). Per Meta, stage II generations were preferred nearly 70% of the time.

“Meta 3D Gen is a two-stage method that combines two components, one for text-to-3D generation and one for text-to-texture generation, respectively,” Meta explains, adding that this approach results in “higher-quality 3D generation for immersive content creation.”

Diagram showing stages of text-to-3D object generation. Input: "a t-rex wearing a green wool sweater" is processed into a 3D model in Stage I. In Stage II, texture is refined for the initial prompt or generated for a new prompt ("a t-rex looking like a panda"). Output: final 3D models.

3DGen combines two of Meta’s foundational generative models, AssetGen and TextureGen, focusing on the relative strengths of each. Meta says that based on feedback from professional 3D artists, its new 3DGen technology is preferred over competing text-to-3D models “a majority of the time” while being three to 60 times faster.

A grid of 16 varied images: a beagle in a detective's outfit, a bear dressed as a lumberjack, a ceramic lion, a chihuahua in a tutu, a dachshund wearing a hat, a delicious croissant, a gold goose, a Frazer Nash Super Sport car, sourdough bread, a hippo in a sweater, a pug in a bee costume, a cow puppy, a stack of pancakes in maple syrup, a bear in medieval armor, a Mandarin duck swimming.
A curated selection of results from 3DGen.

It is worth noting that by separating mesh models and texture maps, 3DGen promises significant control over the final output and allows for the iterative refinement common to text-to-image generators. Users can adjust the input for texture style without tweaking the underlying model.

A compilation image shows various rendered objects including a stylish metal llama statue, a sushi tray with pugs, and an orc forging a hammer on an anvil. Different rendering tools and times are labeled below each object: CSM Cube 2.0, Tripo3D, Meshy v3 (refined), Rodin Gen-1, and ours.
A comparison of 3DGen results (far right column) versus competing text-to-3D models across three different prompts.

Meta’s complete technical paper about 3DGen goes into significantly more detail and shows evaluative testing results compared to other text-to-3D models.

Image credits: Meta AI