Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Prior
https://t.me/reading_ai, @AfeliaNποΈ Project Page
π Paper
π GitHub
π Date: 30 Jun 2023
Main idea
- Motivation: actually the same type of models have already been presented :)
- Solution: Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both 2D and 3D priors. The first stage optimizes NeRF model to produce coarse geometry. In the second - mesh optimization is conducted (see Magic3D). Additionally, authors use knowledge of 3d priors to improve consistency.
Pipeline

The framework consists of two parts.
- Optimize NeRF (Instant-NGP) with several losses.
- First of all the reconstruction loss for the given view is implied. The authors use just the usual MSE loss for masked RGB prediction and true image and MSE loss for the mask.

where I^r -> reference image
M -> foreground mask
v^r -> reference viewpoint
G -> NeRF rendering
- Then authors use depth supervision and estimate the depth of a given image using MiDas.

d^r -> reference disparity
- Additionally, the normal smoothness loss is applied.

tau -> stopgrad operator
g -> Gaussian blur
- And finally, SDS loss based on 2D and 3D priers is applied (more details further).
2. Fine stage uses DMTet to optimise SDF_Mesh the same way as in Magic3D
Joint 2D and 3D priors for image-to-3D generation
- 2D priors β SDS loss

- 3D prior
As a strong 3D prior the authors use Zero-1-to-3 model.
Differently from 2D priors the 3D use the reference view I^r with the novel view camera poses as guidance.
The 3D prior utilizes camera poses to encourage 3D consistency and enable the usage of more 3D information compared to the 2D prior.

Implementation details
Datasets: NeRF4 (collected from 4 scenarios - chair, drums, ficus, and microphone, out of the 8 test examples from the synthetic NeRF dataset), RealFusion15 (15 natural images)
Models: Instant-NGP (coarse stage). DMTet (fine stage), Stable Diffusion (2d prior SDS), Zero-1-to-3 (3D prior SDS).
Compares with: Point-E, Shape-E, Zero123, 3DFuse, RealFusion, NeuralLift


Metrics: CLIP-Simulariy, PSNR, LPIPS

Results

