Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 2.31 KB

2501.16764.md

File metadata and controls

7 lines (5 loc) · 2.31 KB

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.

近年来,从文本或单张图像生成3D内容的研究取得了进展,但仍受限于高质量3D数据集的匮乏,以及2D多视角生成带来的不一致性问题。为此,我们提出 DiffSplat,一种新颖的3D生成框架,通过调整大规模文本到图像的扩散模型,直接生成3D高斯点云(Gaussian splats)。与现有的3D生成模型不同,DiffSplat 有效利用了网络规模的2D先验信息,同时在一个统一的模型中保持3D一致性。 为了优化训练过程,我们提出了一种轻量级重建模型,可即时生成多视角高斯点云网格,用于大规模数据集的构建。此外,在常规扩散损失的基础上,我们引入了 3D渲染损失,以增强跨任意视角的3D一致性。由于DiffSplat与图像扩散模型兼容,因此可以无缝适配各种图像生成技术到3D领域。 大量实验表明,DiffSplat 在文本和图像条件生成任务及下游应用中均表现出色。此外,详尽的消融研究验证了每个关键设计选择的有效性,并提供了对其内在机制的深入理解。