Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Zhang, Jinjin; Huang, Qiuyu; Liu, Junjie; Guo, Xiefan; Huang, Di

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.18352 (cs)

[Submitted on 24 Mar 2025 (v1), last revised 28 Mar 2025 (this version, v2)]

Title:Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Authors:Jinjin Zhang, Qiuyu Huang, Junjie Liu, Xiefan Guo, Di Huang

View PDF HTML (experimental)

Abstract:In this paper, we present Diffusion-4K, a novel framework for direct ultra-high-resolution image synthesis using text-to-image diffusion models. The core advancements include: (1) Aesthetic-4K Benchmark: addressing the absence of a publicly available 4K image synthesis dataset, we construct Aesthetic-4K, a comprehensive benchmark for ultra-high-resolution image generation. We curated a high-quality 4K dataset with carefully selected images and captions generated by GPT-4o. Additionally, we introduce GLCM Score and Compression Ratio metrics to evaluate fine details, combined with holistic measures such as FID, Aesthetics and CLIPScore for a comprehensive assessment of ultra-high-resolution images. (2) Wavelet-based Fine-tuning: we propose a wavelet-based fine-tuning approach for direct training with photorealistic 4K images, applicable to various latent diffusion models, demonstrating its effectiveness in synthesizing highly detailed 4K images. Consequently, Diffusion-4K achieves impressive performance in high-quality image synthesis and text prompt adherence, especially when powered by modern large-scale diffusion models (e.g., SD3-2B and Flux-12B). Extensive experimental results from our benchmark demonstrate the superiority of Diffusion-4K in ultra-high-resolution image synthesis.

Comments:	Accepted to CVPR 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.18352 [cs.CV]
	(or arXiv:2503.18352v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.18352

Submission history

From: Jinjin Zhang [view email]
[v1] Mon, 24 Mar 2025 05:25:07 UTC (35,812 KB)
[v2] Fri, 28 Mar 2025 04:51:44 UTC (35,813 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators