Skip to yearly menu bar Skip to main content


Poster

SCube: Instant Large-Scale Scene Reconstruction using VoxSplats

Xuanchi Ren · Yifan Lu · hanxue liang · Jay Zhangjie Wu · Huan Ling · Mike Chen · Sanja Fidler · Francis Williams · Jiahui Huang


Abstract: We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images. Our method encodes reconstructed scenes using a novel representation, which we call VoxSplats. VoxSplats are a set of Gaussian splats supported on a high-resolution sparse-voxel scaffold. To reconstruct a VoxSplat from images, we employ a hierarchical voxel latent diffusion model conditioned on the input images followed by a feedforward appearance prediction model. The diffusion model generates progressively higher resolution grids in a coarse-to-fine manner, and the appearance network predicts a set of Gaussians within each voxel. From as few as *3 non-overlapping input images*, SCube can generate millions of Gaussians within a $1024^3$ voxel grid spanning *hundreds of meters* in *20 seconds*. Past works tackling scene reconstruction from images either rely on per-scene optimization and fail to reconstruct the scene away from input views (thus requiring dense view coverage as input) or leverage geometric priors based on low-resolution models, which produce blurry results. In contrast, SCube leverages high-resolution sparse networks and produces sharp outputs from few views. We show the superiority of SCube compared to prior art using the Waymo self-driving dataset on 3D reconstruction and demonstrate its applications, such as LiDAR simulation and text-to-scene generation.

Live content is unavailable. Log in and register to view live content