How Much is Enough? A Study on Diffusion Times in Score-based Generative Models

Abstract

Score-based diffusion models are a class of generative models whose dynamics is described by stochastic differential equations that map noise into data. While recent works have started to lay down a theoretical foundation for these models, an analytical understanding of the role of the diffusion time \(T\) is still lacking. Current best practice advocates for a large \(T\) to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution; however, a smaller value of \(T\) should be preferred for a better approximation of the score-matching objective and higher computational efficiency.

Starting from a variational interpretation of diffusion models, this work quantifies this trade-off and proposes a way to improve the quality and efficiency of both training and sampling by adopting smaller diffusion times. The key idea is to use an auxiliary model to bridge the gap between the ideal and simulated forward dynamics before applying a standard reverse diffusion process. On image data, the resulting method remains competitive with the state of the art according to sample quality metrics and log-likelihood.

Key Contributions

A variational view of diffusion time: We analyze the role of the diffusion horizon \(T\) through a variational decomposition that makes explicit the trade-off between approximation quality, reverse-process mismatch, and computational efficiency.
A case against blindly increasing \(T\): The paper shows that large diffusion times are not universally optimal. Shorter horizons can offer better training objectives and cheaper sampling, provided the endpoint mismatch is handled correctly.
An auxiliary initialization strategy: We introduce an auxiliary model that reduces the discrepancy between the forward endpoint and the initialization of the reverse diffusion process, making smaller diffusion times practical.
Competitive empirical performance: Experiments on image generation show that the proposed approach matches strong baselines on sample quality and likelihood while using a shorter diffusion process.

Methodology Overview

Illustration of the diffusion-time trade-off studied in the paper.

The paper revisits a basic design choice in score-based generative modeling: how long the forward diffusion process should run. A large diffusion time simplifies the reverse-process initialization because the terminal distribution becomes close to a tractable reference noise distribution, but it also increases computational cost and weakens the score-matching approximation.

The proposed solution is to keep the diffusion horizon shorter and compensate for the resulting endpoint mismatch with an auxiliary model. This preserves the practical reverse-time sampling pipeline while improving the efficiency-quality trade-off.

Main Findings

Diffusion time should be treated as a tunable modeling choice rather than fixed to a very large value by default.
Shorter diffusion horizons can improve efficiency and better align with the training objective.
The main obstacle to small \(T\) is the mismatch between the simulated forward process and the reverse-process starting distribution.
An auxiliary initialization model is sufficient to recover strong image-generation performance under shorter diffusion times.