Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Prior

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Prior

https://t.me/reading_ai, @AfeliaN

πŸ—‚οΈ Project Page

πŸ“„ Paper

πŸ“Ž GitHub

πŸ—“ Date: 30 Jun 2023

Main idea

  • Motivation: actually the same type of models have already been presented :)
  • Solution: Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both 2D and 3D priors. The first stage optimizes NeRF model to produce coarse geometry. In the second - mesh optimization is conducted (see Magic3D). Additionally, authors use knowledge of 3d priors to improve consistency.

Pipeline

Fig 1. Pipeline

The framework consists of two parts.

  1. Optimize NeRF (Instant-NGP) with several losses.
  • First of all the reconstruction loss for the given view is implied. The authors use just the usual MSE loss for masked RGB prediction and true image and MSE loss for the mask.
Fig 2. Reconstruction loss

where I^r -> reference image

M -> foreground mask

v^r -> reference viewpoint

G -> NeRF rendering

  • Then authors use depth supervision and estimate the depth of a given image using MiDas.
Fig 3. Depth loss

d^r -> reference disparity

  • Additionally, the normal smoothness loss is applied.
Fig 4. Normal loss

tau -> stopgrad operator

g -> Gaussian blur

  • And finally, SDS loss based on 2D and 3D priers is applied (more details further).

2. Fine stage uses DMTet to optimise SDF_Mesh the same way as in Magic3D

Joint 2D and 3D priors for image-to-3D generation

  • 2D priors β†’ SDS loss
Fig 5. SDS loss
  • 3D prior

As a strong 3D prior the authors use Zero-1-to-3 model.

Differently from 2D priors the 3D use the reference view I^r with the novel view camera poses as guidance.

The 3D prior utilizes camera poses to encourage 3D consistency and enable the usage of more 3D information compared to the 2D prior.

Fig 6. 2D and 3D priors

Implementation details

Datasets: NeRF4 (collected from 4 scenarios - chair, drums, ficus, and microphone, out of the 8 test examples from the synthetic NeRF dataset), RealFusion15 (15 natural images)

Models: Instant-NGP (coarse stage). DMTet (fine stage), Stable Diffusion (2d prior SDS), Zero-1-to-3 (3D prior SDS).

Compares with: Point-E, Shape-E, Zero123, 3DFuse, RealFusion, NeuralLift

Fig 7. Qalitive comparisons
Fig 8. Qalitive comparisons part 2.

Metrics: CLIP-Simulariy, PSNR, LPIPS

Fig 9. Metrics results

Results

Fig 10. Trade-off between 2D and 3D priors
Fig 11. Trade-off between 2D and 3D priors. Metrics


Report Page