Diffusion Models

4 min read

“Learning to reverse noise and recover data.”

🌟 What Are Diffusion Models? #

Diffusion Models are a type of generative model that work by:

Gradually adding noise to data until it becomes pure noise (the forward process).
Learning to reverse this noise to reconstruct the original data (the reverse process).

This allows the model to generate new data from noise, like creating a painting from a blank canvas of static.

🧠 Intuition #

Imagine starting with a high-quality image:

You corrupt it step-by-step with increasing noise until it becomes unrecognizable.
Then you train a model to reverse this process, predicting how to “de-noise” it, step by step.

Eventually, the model can start from pure noise and work backward to generate entirely new data that looks realistic.

This process is probabilistic, iterative, and powerful.

🧪 How It Works (Step-by-Step) #

1. Forward Process (Diffusion) #

We slowly add Gaussian noise to an image over TTT steps. q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 – \beta_t}x_{t-1}, \beta_t I)q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)

Eventually:

xT∼N(0,I)x_T \sim \mathcal{N}(0, I)xT∼N(0,I) → almost pure noise.

2. Reverse Process (Denoising) #

We train a neural network ϵθ(xt,t)\epsilon_\theta(x_t, t)ϵθ(xt,t) to predict the noise added at each step.
This gives us: pθ(xt−1∣xt)=N(xt−1;μθ(xt,t),Σθ(xt,t))p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t))pθ(xt−1∣xt)=N(xt−1;μθ(xt,t),Σθ(xt,t))

By sampling this in reverse (from xTx_TxT to x0x_0x0), we can generate new data.

🏗️ Architecture #

Typically, a U-Net architecture is used (especially for images), which:

Processes the noisy image and time step
Predicts the noise at each stage
Uses skip connections to retain spatial detail

🖼️ Visual Summary #

textCopyEditForward Process (add noise):

Image → slightly noisy → more noise → ... → pure noise

Reverse Process (remove noise):

Pure noise → slightly clearer → more details → final image

🧠 Loss Function #

The training goal is to predict the noise added during the forward process.
So, we minimize the difference between actual noise and predicted noise: Lsimple=Ex,t,ϵ[∥ϵ−ϵθ(xt,t)∥2]\mathcal{L}_{simple} = \mathbb{E}_{x, t, \epsilon} \left[ \left\| \epsilon – \epsilon_\theta(x_t, t) \right\|^2 \right]Lsimple=Ex,t,ϵ[∥ϵ−ϵθ(xt,t)∥2]

This is known as denoising score matching.

🔧 Example Libraries #

🤗 Hugging Face Diffusers
OpenAI GLIDE, DALL·E 2
Stability AI’s Stable Diffusion
Google’s Imagen

📚 Popular Diffusion-Based Models #

Model	Description
DDPM	Denoising Diffusion Probabilistic Models
Stable Diffusion	Text-to-image diffusion with latent space
GLIDE	Guided Language-to-Image Diffusion
Imagen	High-fidelity image generation by Google
Latent Diffusion Models (LDMs)	Run diffusion in a compressed (latent) space

💡 Real-World Use Cases #

🎨 1. Image Generation #

Generate high-quality images from text (e.g., “A dragon flying over Tokyo”).
E.g., Stable Diffusion, DALL·E 2, Midjourney

🧑‍🎨 2. Inpainting & Editing #

Fill missing parts in images (e.g., removing objects or painting new ones).

📹 3. Video Generation (in progress) #

Research into temporal diffusion models for video is advancing rapidly.

🖼️ 4. Super-Resolution #

Enhance low-res images with photorealistic details.

🧪 5. Scientific Applications #

Molecule generation, protein folding, etc.

✅ Advantages #

Feature	Why it matters
High sample quality	Rivals or exceeds GANs in realism
Stable training	No adversarial loss = fewer training issues
Diverse outputs	Different samples from the same prompt
Interpretability	Each generation step is explicit and guided

⚠️ Limitations #

Challenge	Description
Slow sampling	Multiple steps (50–1000) per image
Computational cost	Large models and long training times
Complexity	Requires careful tuning and understanding

Solutions like Latent Diffusion address speed by operating in a lower-dimensional space.

🔁 Diffusion vs Other Generative Models #

Feature	Diffusion Models	GANs	VAEs	Autoregressive
Training Stability	✅ Very stable	❌ Often unstable	✅ Stable	✅ Stable
Output Quality	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐	⭐⭐⭐
Sampling Speed	❌ Slow	✅ Fast	✅ Fast	❌ Slow
Interpretability	✅ High	❌ Low	✅ Medium	✅ High

🧠 Summary #

Concept	Description
Forward process	Add noise to image step-by-step
Reverse process	Learn to remove noise and recover image
Training goal	Predict noise at each step
Output	High-quality, diverse data (especially images)
Best for	Text-to-image generation, super-resolution, inpainting

📌 Bonus: Code Example (Using 🤗 Diffusers) #

bashCopyEditpip install diffusers transformers

pythonCopyEditfrom diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = pipe.to("cuda")

prompt = "a fantasy castle floating in the sky"
image = pipe(prompt).images[0]
image.show()

🎯 Final Thoughts #

Diffusion models are currently the gold standard in many generative AI applications—particularly text-to-image generation. Their ability to create photorealistic, diverse, and controllable outputs is reshaping creative industries, gaming, scientific research, and beyond.

Updated on June 6, 2025

Introduction

Type of Generative Models

Diffusion Models

🌟 What Are Diffusion Models? #

🧠 Intuition #

🧪 How It Works (Step-by-Step) #

1. Forward Process (Diffusion) #

2. Reverse Process (Denoising) #

🏗️ Architecture #

🖼️ Visual Summary #

🧠 Loss Function #

🔧 Example Libraries #

📚 Popular Diffusion-Based Models #

💡 Real-World Use Cases #

🎨 1. Image Generation #

🧑‍🎨 2. Inpainting & Editing #

📹 3. Video Generation (in progress) #

🖼️ 4. Super-Resolution #

🧪 5. Scientific Applications #

✅ Advantages #

⚠️ Limitations #

🔁 Diffusion vs Other Generative Models #

🧠 Summary #

📌 Bonus: Code Example (Using 🤗 Diffusers) #

🎯 Final Thoughts #

Was it helpful ?

Diffusion Models

🌟 What Are Diffusion Models? #

🧠 Intuition #

🧪 How It Works (Step-by-Step) #

1. Forward Process (Diffusion) #

2. Reverse Process (Denoising) #

🏗️ Architecture #

🖼️ Visual Summary #

🧠 Loss Function #

🔧 Example Libraries #

📚 Popular Diffusion-Based Models #

💡 Real-World Use Cases #

🎨 1. Image Generation #

🧑‍🎨 2. Inpainting & Editing #

📹 3. Video Generation (in progress) #

🖼️ 4. Super-Resolution #

🧪 5. Scientific Applications #

✅ Advantages #

⚠️ Limitations #

🔁 Diffusion vs Other Generative Models #

🧠 Summary #

📌 Bonus: Code Example (Using 🤗 Diffusers) #

🎯 Final Thoughts #

Share This Article :

Was it helpful ?