One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

📎 GitHub (coming soon)

Main idea

Motivation: existing methods suffers from several main problems:

time-consuming (need to train NeRF for each scene)
memory intensive (works mostly for low-res images)
3D inconsistent
poor geometry

Solution: One-2-3-45 model that takes a single image of any object as input and generates a full 360-degree 3D textured mesh in a single feed-forward pass. The authors utilize Zero123 model to generate multi-view predictions of the input single image so that can be leveraged multi-view 3D reconstruction techniques to obtain 3D mesh. To improve the geometry the authors use SDF-based NeRF models.

Pipeline

This work is basically based on 2 main techniques:

SparseNeuS - neural rendering based method for the task of surface reconstruction based on sparse images
Zero123 - given a single RGB image of an object and a relative camera transformation, Zero123 aims to control the diffusion model to synthesize a new image under this transformed camera view.

Zero123 - is a very promising method to create the dataset from one image to reconstruct 3D shapes. But the authors have shown, that the results are not satisfactory. This is primarily due to the inconsistency of Zero123’s predictions.

To deal with the problem instead of using usual optimization-based approaches, the authors based the reconstruction module on a generalizable SDF reconstruction method SparseNeuS. The reconstruction module takes m posed source images as input. Then it builds const volume and learns the SDF and color.

In more detail:

Render n ground-truth RGB images from Zero123
For each of n views create 4 nearby views
During training, the authors feed all 4×n predictions with ground-truth poses into the reconstruction module and randomly choose one of the n ground-truth RGB images views as the target view.
Training is supervised both with depth and RGB
Additionally for the training process, the camera poses are estimated.

Implementation detail

Dataset: Objaverse-LVIS

Models: