Zero-shot text-guided object generation with dream fields

A Jain, B Mildenhall, JT Barron… - Proceedings of the …, 2022 - openaccess.thecvf.com
Proceedings of the IEEE/CVF conference on computer vision and …, 2022openaccess.thecvf.com
We combine neural rendering with multi-modal image and text representations to synthesize
diverse 3D objects solely from natural language descriptions. Our method, Dream Fields,
can generate the geometry and color of a wide range of objects without 3D supervision. Due
to the scarcity of diverse, captioned 3D data, prior methods only generate objects from a
handful of categories, such as ShapeNet. Instead, we guide generation with image-text
models pre-trained on large datasets of captioned images from the web. Our method …
Abstract
We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects solely from natural language descriptions. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision. Due to the scarcity of diverse, captioned 3D data, prior methods only generate objects from a handful of categories, such as ShapeNet. Instead, we guide generation with image-text models pre-trained on large datasets of captioned images from the web. Our method optimizes a Neural Radiance Field from many camera views so that rendered images score highly with a target caption according to a pre-trained CLIP model. To improve fidelity and visual quality, we introduce simple geometric priors, including sparsityinducing transmittance regularization, scene bounds, and new MLP architectures. In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.
openaccess.thecvf.com