We introduce DatasetNeRF, an efficient 3D-aware data factory with minimal 2D human annotations based on generative radiance fields. Our 3D-aware Data Factory is adept at creating extensive datasets, delivering high-quality, 3D-consistent, fine-grained semantic segmentations, and 3D point cloud part segmentations.
This is accomplished by training a semantic branch on a pre-trained 3D GAN, such as EG3D, leveraging the semantic features in the generator's backbone to enhance the feature tri-plane for semantic volumetric rendering. To improve the 3D consistency of our segmentations, we incorporate a density prior from the pre-trained EG3D model into the semantic volumetric rendering process. We further exploit the depth prior from the pre-trained model, efficiently back-projecting the semantic output to obtain 3D point cloud part segmentation. Our approach facilitates easy manipulation of viewpoints, allowing us to render semantically consistent masks across multiple views. By merging the back-projected point cloud part segmentations from different perspectives, we can achieve comprehensive point cloud part segmentation of the entire 3D representation. Remarkably, our process for generating this vast array of 3D-aware data requires only a limited set of 2D data for training.
We evaluate our approach on the AFHQ-cat, FFHQ, AIST++ dataset and Nersemble dataset. We generate detailed annotations for these datasets, showing our method outperforms existing baselines by enhancing 3D consistency across video sequences and improving segmentation accuracy for single images. Additionally, we demonstrate that our method is also seamlessly compatible with articulated generative radiance fields on AIST++ dataset. In addition, we qualitatively demonstrate that models trained with our generated dataset can generalize well to real-world scans, such as those in the Nersemble dataset. We also augment the point cloud semantic part segmentation benchmark dataset using our method, with a specific focus on the ShapeNet-Car dataset. Our work further analyzes potential applications like 3D-aware semantic editing and 3D inversion, demonstrating that the ability to generate infinite 3D-aware data from a limited number of 2D labeled annotations paves the way for numerous 2D and 3D downstream applications.
The DatasetNeRF architecture unifies a pretrained EG3D model with a semantic segmentation branch, comprising an enhanced semantic tri-plane, a semantic feature decoder, and a semantic super-resolution module. The semantic feature tri-plane is constructed by reshaping the concatenated outputs from all synthesis blocks of the EG3D generator. The semantic feature decoder interprets aggregated features from semantic tri-plane into a 32-channel semantic feature for every point. The semantic feature map is rendered by semantic volumetric rendering. We incorporate a density prior from the pretrained RGB decoder during the rendering process to enhance 3D consistency. The semantic super-resolution module then upscales and refines the rendered semantic feature map into the final semantic output. The combination of the semantic mask output and the upsampled depth map from the pretrained EG3D model enables an efficient process for back-projecting the semantic mask, thereby facilitating the accurate generation of point cloud part segmentation.
@misc{chi2023datasetnerfefficient3dawaredata,
title={DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields},
author={Yu Chi and Fangneng Zhan and Sibo Wu and Christian Theobalt and Adam Kortylewski},
year={2023},
eprint={2311.12063},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2311.12063}, }