Zero-1-to-3: Zero-shot One Image to 3D Object

Zero-1-to-3: Zero-shot One Image to 3D Object

https://t.me/reading_ai, @AfeliaN

πŸ—‚οΈ Project Page

πŸ“„ Paper

πŸ“Ž GitHub

πŸ—“ Date: 20 Mar 2023

Main idea

Fig 1. Main idea
  • Motivation: If you are able to generate view-consistent images from different view points, based on initial input β†’ you may use these images in NeRF for novel view synthesis.
  • Solution: a framework for changing the camera viewpoint of an object given just a single RGB image based on diffusion model, that is conditioned on viewpoint.

Pipeline

If we have a single RGB image of an object→ our goal is to synthesize an image of the object based on a relative camera rotation and translation of the desired viewpoint.

Fig 2. View-conditioned diffusion

View-Conditioned Diffusion

The main approach of this paper is to take a pre-trained diffusion model and fine-tune it to learn controls over the camera parameters without destroying the rest of the representation.

The authors use a hybrid conditioning mechanism:

  • a CLIP embedding of the input image is concatenated with camera rotation and translation to form a β€œposedCLIP” embedding β†’ cross-attention is applied
  • the input image is channel-concatenated with the image being denoised

3D Reconstruction

As the next step the authors take the open-sourced framework, Score Jacobian Chaining (SJC) , to optimize a 3D representation with priors from text-to-image diffusion models.

Fig 3. 3D Reconstruction

The main pipeline here is the following (seems to be very similar to DreamFusion):

  • sample viewpoints to perform volumetric rendering
  • perturb the resulting images with Gaussian noise and denoise it conditioned on the input image and CLIP embedding
  • In addition, the authors optimize the input view with an MSE loss.

Regularization

  • depth smoothness loss
  • near-view consistency loss

Implementation details

Dataset: Objaverse

Compared with:

Metrics:

  • novel view synthesis: PSNR, SSIM, LPIPS, FID
  • 3D reconstruction: CD, IoU

Results


Novel-view synthesis comparison

Fig 4. Novel view synthesis comparison
Fig 5. Novel view synthesis comparison. GSO


3D reconstruction comparison

Fig 6. 3D reconstruction comparison.


Metrics Novel view synthesis

Fig 7. Metrics for novel view synthesis. RTMV
Fig 8. Metrics for novel view synthesis. GSO


Metrics 3D reconstruction

Fig 7. 3D reconstruction. GSO
Fig 9. 3D reconstruction. RTMV

Additional results

Fig 10. Novel view on in-the-wild data


Report Page