New method generates better prompts from images

A new method learns prompts from an image, which then can be used to reproduce similar concepts in Stable Diffusion.

Whether DALL-E 2, Midjourney or Stable Diffusion: All current generative image models are controlled by text input, so-called prompts. Since the outcome of generative AI models depends heavily on the formulation of these prompts, “prompt engineering” has become a discipline in its own right in the AI community. The goal of prompt engineering is to find prompts that produce repeatable results, that can be mixed with other prompts, and that ideally work for other models as well.

In addition to such text prompts, the AI models can also be controlled by so-called “soft prompts”. These are text embeddings automatically derived from the network, i.e. numerical values that do not directly correspond to human terms. Because soft prompts are derived directly from the network, they produce very precise results for certain synthesis tasks, but cannot be applied to other models.

Blog