DiffRGD: An Inference-Time Diffusion Guidance
Through Riemannian Gradient Descent

Jia-Wei Liao1,4,  Li-Xuan Peng2,  Mei-Heng Yueh3,  Min Sun2,  Cheng-Fu Chou1,  Jun-Cheng Chen4
1National Taiwan University  ·  2National Tsing Hua University  ·  3National Taiwan Normal University  ·  4Academia Sinica (CITI)
ECCV 2026

Abstract

Recently, diffusion models have been widely adopted in generative modeling and have served as foundational models for many image generation tasks. To control the generation without costly re-training or fine-tuning, many works seek inference-time guidance methods to steer the latent via a differentiable objective at inference time. However, these methods cannot effectively preserve the original Gaussian distribution because they introduce distributional drift, thereby degrading the sample quality. To address this gap, we propose DiffRGD, a distribution-aware guidance framework that explicitly preserves the latent Gaussian structure. DiffRGD formulates each sampling step as a constrained optimization problem on a spherical manifold induced by the latent Gaussian distribution, and solves it efficiently via Riemannian Gradient Descent (RGD). DiffRGD is a plug-and-play method that can be seamlessly integrated into any pre-trained diffusion model. Extensive experiments demonstrate that DiffRGD outperforms previous methods in most image restoration and conditional generation tasks.

Overview

Most existing inference-time guidance methods inject external gradients directly into the sampling process, formulating the per-step update as:

$$\hat{\bm{x}}_{t-1} \sim p_\theta(\bm{x}_{t-1} | \bm{x}_{t}), \qquad \bm{x}_{t-1} = \argmin_{\bm{x} \in \mathcal{N}(\hat{\bm{x}}_{t-1})} \mathcal{L}(\hat{\bm{x}}_0(\bm{x}, t-1), \bm{y}).$$

Because the neighborhood $\mathcal{N}(\hat{\bm{x}}_{t-1})$ is defined in the ambient Euclidean space, these methods cannot preserve the stepwise Gaussian latent distribution, causing distributional drift that ultimately degrades sample quality. DiffRGD addresses this by performing gradient updates directly on a distribution-induced spherical manifold.

DiffRGD teaser
Figure 1. Previous works perform guidance without latent-distribution-aware constraints, which can result in off-manifold phenomena that hinder sample quality. DiffRGD proposes a latent-distribution-aware geometry constraint tailored to the Gaussian properties of the diffusion model.

Method

Starting from a noisy latent $\bm{x}_t$ at timestep $t$, DDIM defines the next-step latent $\bm{x}_{t-1}$ as sampling from a Gaussian distribution:

$$\bm{x}_{t-1} \sim \mathcal{N}(\bm{\mu}_t, \sigma_t^2 \bm{I}_n),$$

where the mean is defined by

$$\bm{\mu}_t = \sqrt{\bar{\alpha}_{t-1}}\,\hat{\bm{x}}_0(\bm{x}_t, t) + \sqrt{1 - \bar{\alpha}_{t-1} - \sigma_t^2}\,\bm{\epsilon}_\theta(\bm{x}_t, t),$$

and the predicted clean sample is estimated by Tweedie's formula:

$$\hat{\bm{x}}_0(\bm{x}_t, t) = \frac{1}{\sqrt{\bar{\alpha}_t}} \left( \bm{x}_t - \sqrt{1 - \bar{\alpha}_t}\,\bm{\epsilon}_\theta(\bm{x}_t, t) \right).$$

Polar Decomposition of Isotropic Gaussian

Proposition 1 (Polar Decomposition of Isotropic Gaussian)

At each diffusion timestep $t$, the latent $\bm{x}_t \sim \mathcal{N}(\bm{\mu}_t, \sigma_t^2 \bm{I}_n)$ follows an isotropic Gaussian distribution centered at $\bm{\mu}_t$. It admits a polar decomposition:

$$\bm{x}_t = \bm{\mu}_t + \sigma_t r\, \bm{u},$$

where $r \sim \chi(n)$ is a chi-distributed random variable with $n$ degrees of freedom, and $\bm{u} \sim \operatorname{Unif}(\mathbb{S}^{n-1})$ is a unit vector uniformly distributed on the sphere. $r$ and $\bm{u}$ are statistically independent.

By fixing the sampled radius $r$, we construct a spherical manifold that captures the Gaussian structure as the feasible set for RGD:

$$\mathcal{S}_{t,r} = \left\{ \bm{x} \in \mathbb{R}^n \;\middle|\; \|\bm{x} - \bm{\mu}_t\|_2 = \sigma_t r \right\}.$$
Spherical manifold construction
Figure 2. Gaussian distribution-aware spherical manifold construction.

Problem Formulation

We formulate the per-timestep guidance as a constrained optimization problem on $\mathcal{S}_{t,r}$:

$$\bm{x}_{t-1} = \argmin_{\bm{x} \in \mathcal{S}_{t,r}} \mathcal{L}(\hat{\bm{x}}_0(\bm{x}, t-1), \bm{y}).$$

Riemannian Gradient Descent for Diffusion Guidance

To solve the constrained optimization on $\mathcal{S}_{t,r}$, we employ RGD with three geometric operations:

Tangent Space

$$\operatorname{T}_{\bm{x}}\mathcal{S}_{t,r} = \left\{ \bm{v} \in \mathbb{R}^n \;\middle|\; (\bm{x} - \bm{\mu}_t)^\top \bm{v} = 0 \right\}.$$

Projection Operator

$$\Pi_{\operatorname{T}_{\bm{x}}\mathcal{S}_{t,r}}(\bm{v}) = \left( \bm{I}_n - \frac{(\bm{x} - \bm{\mu}_t)(\bm{x} - \bm{\mu}_t)^\top}{\|\bm{x} - \bm{\mu}_t\|_2^2} \right) \bm{v}.$$

Retraction

$$\operatorname{R}_{\bm{x}}(\bm{v}) = \bm{\mu}_t + \sigma_t r \cdot \frac{\bm{x} - \bm{\mu}_t + \bm{v}}{\|\bm{x} - \bm{\mu}_t + \bm{v}\|_2}.$$
DiffRGD illustration
Figure 3. An illustration of DiffRGD.

The Riemannian gradient and retraction-based update at each inner iteration $k$ are:

$$\operatorname{grad}_{\mathcal{S}_{t,r}}\mathcal{L} = \Pi_{\operatorname{T}_{\bm{x}_{t-1}^{(k)}}\mathcal{S}_{t,r}}\!\left(\nabla_{\bm{x}_{t-1}^{(k)}} \mathcal{L}(\hat{\bm{x}}_0(\bm{x}_{t-1}^{(k)}, t-1), \bm{y})\right),$$
$$\bm{x}_{t-1}^{(k+1)} = \operatorname{R}_{\bm{x}_{t-1}^{(k)}}\!\left(-\eta_t^{(k)}\operatorname{grad}_{\mathcal{S}_{t,r}}\mathcal{L}\right).$$

Algorithm

Input: Reference image $\bm{y} \in \mathbb{R}^n$, diffusion model $\bm{\epsilon}_\theta$, loss $\mathcal{L}$, DDIM schedule $\{\bar{\alpha}_t\}_{t=1}^T$, noise levels $\sigma_t > 0$, guidance strengths $\{\eta_t^{(k)}\}_{t=1}^T$, inner iterations $K \in \mathbb{N}$.
1: $\bm{x}_T \sim \mathcal{N}(\bm{0}, \bm{I}_n)$
2: for $t = T$ to $1$ do
3: $r \sim \chi(n)$
4: $\bm{u} \sim \operatorname{Unif}(\mathbb{S}^{n-1})$
5: $\bm{\mu}_t \leftarrow \sqrt{\bar{\alpha}_{t-1}}\,\hat{\bm{x}}_0(\bm{x}_t,t) + \sqrt{1 - \bar{\alpha}_{t-1} - \sigma_t^2}\,\bm{\epsilon}_\theta(\bm{x}_t, t)$
6: $\bm{x}_{t-1}^{(0)} \leftarrow \bm{\mu}_t + \sigma_t r\,\bm{u}$
7: for $k = 0$ to $K-1$ do
8: $\bm{g} \leftarrow \nabla_{\bm{x}_{t-1}^{(k)}} \mathcal{L}(\hat{\bm{x}}_0(\bm{x}_{t-1}^{(k)}, t-1), \bm{y})$ ▷ Guidance gradient
9: $\operatorname{grad}_{\mathcal{S}_{t,r}}\mathcal{L} \leftarrow \Pi_{\operatorname{T}_{\bm{x}_{t-1}^{(k)}}\mathcal{S}_{t,r}}(\bm{g})$ ▷ Riemannian gradient
10: $\bm{x}_{t-1}^{(k+1)} \leftarrow \operatorname{R}_{\bm{x}_{t-1}^{(k)}}(-\eta_t^{(k)}\operatorname{grad}_{\mathcal{S}_{t,r}}\mathcal{L})$ ▷ Retraction
11: end for
12: $\bm{x}_{t-1} \leftarrow \bm{x}_{t-1}^{(K)}$
13: end for
Return: Guided sample $\bm{x}_0$.

Experiments

Image Restoration

We evaluate on four inverse problem tasks using 150 images from FFHQ 256×256. Each task is formulated as a noisy linear inverse problem $\bm{y} = \mathcal{A}(\bm{x}) + \bm{n}$, $\bm{n} \sim \mathcal{N}(\bm{0}, 0.05^2\bm{I})$, with guidance objective $\mathcal{L}_\text{inv}(\hat{\bm{x}}_0, \bm{y}) = \|\mathcal{A}(\hat{\bm{x}}_0) - \bm{y}\|_2$. Tasks: (i) Inpainting (70% random mask), (ii) Super-Resolution ×4 (bicubic), (iii) Gaussian Deblurring (31×31, $\sigma=3.0$), (iv) Motion Deblurring (61×61, intensity 0.5).

Table 1. Quantitative results using DDIM 1,000 sampling steps on 150 samples from the FFHQ 256×256 validation set. Metrics averaged over 100 bootstrap runs. Bold = best, underline = second-best, * = statistically significant improvement ($p < 0.05$) over second-best.

TaskMethodVenue PSNR ↑SSIM ↑LPIPS ↓FID ↓
Inpainting DPSICLR 202330.440.8630.15342.68
MPGDICLR 202427.510.7240.25668.24
DSGICML 202431.030.8660.14436.30
ADMMDiffCVPR 202532.380.8990.11929.52
DiffRGD (Ours)ECCV 202634.040.9260.09621.88
Super-Res. ×4 DPSICLR 202326.030.7270.26080.05
MPGDICLR 202424.400.6140.354101.50
DSGICML 202426.710.7370.25674.67
ADMMDiffCVPR 202526.480.7120.29796.69
DiffRGD (Ours)ECCV 202627.770.7830.22063.94
Gaussian Deblur DPSICLR 202325.880.7210.23769.38
MPGDICLR 202424.070.5760.32895.12
DSGICML 202427.450.7510.25975.85
ADMMDiffCVPR 202526.570.7570.22679.30
DiffRGD (Ours)ECCV 202626.800.7570.21863.83
Motion Deblur DPSICLR 202324.470.6850.27180.75
MPGDICLR 202423.150.5690.357106.99
DSGICML 202426.800.7090.29087.96
ADMMDiffCVPR 202527.260.7780.22272.92
DiffRGD (Ours)ECCV 202625.840.7360.25072.26
Qualitative results on FFHQ
Figure 4. Qualitative comparison for image restoration tasks on FFHQ 256×256 validation set. Our proposed method achieves better restoration results with fewer artifacts than other compared methods.

Conditional Generation

We evaluate on three conditional generation tasks from CelebA-HQ 256×256: segmentation map, sketch, and FaceID guidance. Given a conditional image $\bm{y}$, we extract task-specific features via a pre-trained model $\psi_\theta$, and define the guidance loss as:

$$\mathcal{L}_\text{cond}(\hat{\bm{x}}_0, \bm{y}) = \|\psi_\theta(\hat{\bm{x}}_0) - \psi_\theta(\bm{y})\|_2.$$

Table 2. Quantitative results using DDIM 100 sampling steps on 150 samples from CelebA-HQ 256×256. Metrics averaged over 100 bootstrap runs. * = statistically significant improvement ($p < 0.05$).

Method Segmentation Map Sketch FaceID
mIoU ↑FID ↓KID ↓ Sketch-$\ell_2$ ↓FID ↓KID ↓ FaceID-$\ell_2$ ↓FID ↓KID ↓
FreeDoM0.622156.020.05730.85101.900.0170.557127.050.032
DSG0.750117.480.02821.36107.000.0240.34095.270.014
ADMMDiff0.758101.860.03130.8297.520.0140.346100.810.014
DiffRGD (Ours)0.80496.100.02619.4887.820.0120.30393.800.011
Qualitative results on conditional generation
Figure 5. Qualitative results for conditional generation tasks on CelebA-HQ 256×256. (a) segmentation maps (b) sketches (c) FaceID. Our method generates results that better align with the given conditions than other state-of-the-art methods.

Citation

@inproceedings{liao2026diffrgd,
  title     = {DiffRGD: An Inference-Time Diffusion Guidance
               Through Riemannian Gradient Descent},
  author    = {Liao, Jia-Wei and Peng, Li-Xuan and Yueh, Mei-Heng
               and Sun, Min and Chou, Cheng-Fu and Chen, Jun-Cheng},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026},
}