While consumer displays increasingly support more than 10 stops of dynamic range, most image assets — such as internet photographs and generative AI content — remain limited to 8-bit low dynamic range (LDR), constraining their utility across high dynamic range (HDR) applications. Currently, no generative model can produce high-bit, high-dynamic range content in a generalizable way. Existing LDR-to-HDR conversion methods often struggle to produce photorealistic details and physically-plausible dynamic range in the clipped areas.
We introduce LEDiff, a method that enables a generative model with HDR content generation through latent space fusion inspired by image-space exposure fusion techniques. It also functions as an LDR-to-HDR converter, expanding the dynamic range of existing low-dynamic range images.
Finetuning scheme of decoder and denoiser.
(Left: fine-tuning the decoder) Exposure bracketed images \(I_{+}, I_{0}, I_{-}\) are encoded via the pre-trained encoder to generate corresponding latent codes.
These latent codes are fused using a learnable fusion module \(\mathcal{F}\) to produce a latent code \(\mathcal{C}_{\text{merge}}\) free of clipping,
which is then decoded into an HDR image \(\mathcal{H}\) through the finetuned decoder.
(Right: fine-tuning the denoiser) The model takes as input the latent code \(\mathcal{C}_{+}\) for training highlight denoiser \(\epsilon_{\theta_{-}}\) or \(\mathcal{C}_{-}\) for training shadow denoiser \(\epsilon_{\theta_{+}}\), along with a \(\mathcal{C}_{0}\) corrupted by randomly sampled noise.
Results of Text-to-HDR image. Our method enables the generation of HDR images from text prompts, overcoming the limitation of Stable Diffusion, which is restricted to producing LDR images (To illustrate the hallucination, we reduced the exposure for better visualization.).
Results of LDR-to-HDR image. Our method effectively hallucinates details in both over- and under-exposed regions, while previous approaches struggle to produce plausible results, especially in shadow regions that they do not address (e.g., HDRCNN and MaskHDR yield identical results for shadow hallucination, as both methods process non-clipped regions in the same way.). Images are tone-mapped for visualization.
Results of image-to-HDR video. Our method enables the baseline model, SVD, to generate HDR video from a single LDR image.