1. What
Stable Diffusion is a state of the art text-to-image model that generates images from text.
Online test by Hugging Face
E.g the text: a samoyed running on the moon
Confusion:
- How text to image
- What kind of ML/DL tech used
2. How
Ref By Vox
5 step:
- Prepare training data
- Deep learning
- Latent space
- Generation (diffusion)
- Output
The key step is the latent space and diffusion
The deep learning network would learn from the dataset to learn all kinds of latent space for different word. For example, how yellow is the image, how round is the object, etc.
Diffusion start with noise and invert the noise step, guess the true data from noisy data Ref
From Gaussin to inverse the bayesian. the label would work as a guide as the distribution