Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Prior

https://t.me/reading_ai, @AfeliaN

🗂️ Project Page

📄 Paper

📎 GitHub

🗓 Date: 30 Jun 2023

Main idea

Motivation: actually the same type of models have already been presented :)
Solution: Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both 2D and 3D priors. The first stage optimizes NeRF model to produce coarse geometry. In the second - mesh optimization is conducted (see Magic3D). Additionally, authors use knowledge of 3d priors to improve consistency.

Pipeline

The framework consists of two parts.

Optimize NeRF (Instant-NGP) with several losses.

First of all the reconstruction loss for the given view is implied. The authors use just the usual MSE loss for masked RGB prediction and true image and MSE loss for the mask.

where I^r -> reference image

M -> foreground mask

v^r -> reference viewpoint

G -> NeRF rendering

Then authors use depth supervision and estimate the depth of a given image using MiDas.

d^r -> reference disparity

Additionally, the normal smoothness loss is applied.

tau -> stopgrad operator

g -> Gaussian blur

And finally, SDS loss based on 2D and 3D priers is applied (more details further).

2. Fine stage uses DMTet to optimise SDF_Mesh the same way as in Magic3D

Joint 2D and 3D priors for image-to-3D generation

2D priors → SDS loss

3D prior

As a strong 3D prior the authors use Zero-1-to-3 model.

Differently from 2D priors the 3D use the reference view I^r with the novel view camera poses as guidance.

The 3D prior utilizes camera poses to encourage 3D consistency and enable the usage of more 3D information compared to the 2D prior.

Implementation details

Datasets: NeRF4 (collected from 4 scenarios - chair, drums, ficus, and microphone, out of the 8 test examples from the synthetic NeRF dataset), RealFusion15 (15 natural images)

Models: Instant-NGP (coarse stage). DMTet (fine stage), Stable Diffusion (2d prior SDS), Zero-1-to-3 (3D prior SDS).

Compares with: Point-E, Shape-E, Zero123, 3DFuse, RealFusion, NeuralLift

Metrics: CLIP-Simulariy, PSNR, LPIPS

Results

Fig 10. Trade-off between 2D and 3D priors

Fig 11. Trade-off between 2D and 3D priors. Metrics

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Prior

Report Page