Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond.

Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond.

https://t.me/reading_ai, @AfeliaN

🗂️ Project Page

📄 Paper

📎 GitHub

🗓 Date: 26 Apr 2023

Main idea

  • Motivation: Despite diffusion models having significant success, using negative prompts has its own limitations, particularly when there is an overlap between the main and negative prompts. Additionally, current solutions for generating 3D assets from given text or an image often lead to the Janus (multi-head) problem.
  • Solution: Perp-Neg, a new algorithm that leverages the geometrical properties of the score space to address the shortcomings of the current negative prompts algorithm.

Pipeline

The problem of semantic overlap

Usually, positive and negative prompts for conditioning of diffusion models are overlapped. In an ideal situation, we need 2 independent text prompts while in practice it is barely possible. This can lead to undesired results as it is shown in the image below (In the second row of images, we can clearly observe the key concepts requested in the main text prompt (respectively “armchair”, “sunglasses”, “crown”, and “horse”) are removed when those concepts appear in the negative prompts).

Fig 1. Illustration of Negative Prompting problem

To leverage this problem the authors suggest using a perpendicular gradient and let us discuss, what it means.

Recall when c1 and c2 are independent, both of them possess a denoising score component

Fig 2. Independent case

But in case of discussed overlapping it is better to improve the score component in the following way.

Considering the geometrical interpretation of $e_{\Theta}^i$ a natural solution is to find the perpendicular gradient of $e_{\Theta}^i$ as the independent component of $e_{\Theta}^2$

Fig 3. Perpendicular gradient

The most important property of the suggested perpendicular gradient is that the component of $e_{\Theta}^1$ won’t be affected by the additional prompt.

So in the image below the main sampling pipeline is illustrated:

Fig 4. Different sampling
Fig 5. Pseudocode of a PerpNeg algorithm

3D

And how it can be used in 3D? There are several works, based on implying diffusion priors to build 3D assets, such as DreamFusion, Magic3D and so on. All of these works have so-called Janus (multi-faced) problem. For instance, when the model is asked to generate a 3D sample of a person/animal, the generated object has multiple faces instead of having a back view.

Fig 6. Janus problem

Some works, such as 3DFuse, try to incorporate this problem by giving additional information about depth, but the results are still imperfect.

Another proposed solution is to use view-dependent prompting (e.g. adding back view, side view or overhead view with respect to camera position), but It also does not fully solve the problem,

So authors propose the following method. First of all, they defined txt_{back}, txt_{side}, txt_{fromt} as the main text prompts appended by back, side, and front views, respectively.

Then simple view-dependent prompts are replaced with new sets of positive and negative prompts:

Fig 7. Improved view-dependent promping

The authors also observed increasing the weight of the negative prompt makes the algorithm focus more on avoiding that view, acting as a pose factor.

Additionally, to interpolate between the side and back views, authors use the following embedding as the positive prompt:

Fig 8. Interpolation between side and back vies

and for negative:

Fig 9. Negative prompts interpolation

To interpolate between the front and side view:

Fig 10. Interpolation between side and front vies

and the corresponding negative prompts:

Fig 11. Negative prompts interpolation

Additionally, authors improve the SDS loss:

Fig 12. SDS loss

Implementation details

Experinment. Semantic aligned 2D generation

Compared with: Stable diffusion, Compositional Energy-based Model (CEMB)

Results

Fig 13. Comparison of genetatioin of the back view
Fig 14. Comparison of successful generation rate
Fig 14. Comparison of view generation
Fig 15. Comparison of successful generations

Pros: such usage of negative prompts improves the results in 2D and 3D. It doesn’t require to have difficult prompt engineering process and solves the problem of multiple heads in 3D generation

Results

2D images

fig 16. 2D view generation

3D assets

Fig 17. 3D Assets generation


Report Page