4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation

DataCV @ CVPR 2026 · Accepted 🎉

Zihao Zhu¹, Kuan-Ru Huang¹, Zhaoming Xu¹, Renjie Li¹, Bo Wu¹, Ruizheng Bai¹, Mingyang Wu¹, Sayak Paul², Zhengzhong Tu^†,1

¹Texas A&M University ²Hugging Face

TL;DR

4KLSDB is the first openly released native-4K image dataset that scales to 129k+ training images and is designed for both image restoration and generation research. Every model in our paper — HiT-SR, SwinIR, MambaIR, OSEDiff, SeeSR, and Sana — gets a consistent and substantial boost when fine-tuned on 4KLSDB.

🖼️ Dataset: 129,484 train / 2,000 val / 1,984 test, all native-4K, with captions, on 🤗 Hugging Face.
🧱 Pre-trained checkpoints: every SR/T2I model released under 🤗 taco-group.
🚀 One-click inference: ready-to-run scripts for each model under scripts/.
🏋️ One-click training: reproducible YAML configs and shell scripts under models/.

📰 News

2026-05 — Public release of dataset, code, and pretrained weights.
2026-05 — Paper released on arXiv.

✨ Highlights

Native 4K — every image meets a minimum dimension of 3840 px and a 3840 × 2160 pixel budget.
Scale — 129,484 training images, 22× larger than DIV8K, 150× larger than DIV2K.
Quality pipeline — Q-Align aesthetic scoring + Laplacian/Sobel texture filtering + two human annotators per image.
Dual-purpose — works out-of-the-box for classical SR (×4 / ×8 / ×16), real-world SR, and 4K T2I generation.
Reproducibility — all training configs, blind-degradation pipeline, and evaluation code are included.

📦 Dataset

The 4KLSDB dataset is hosted on Hugging Face:

https://huggingface.co/datasets/SingleBicycle/4KLSDB

from datasets import load_dataset

# Streaming (recommended — the train split is ~1.5 TB)
ds = load_dataset("SingleBicycle/4KLSDB", split="train", streaming=True)
for ex in ds:
    print(ex["image"], ex["caption"])
    break

Split	#Images	Format	Notes
train	129,484	image + caption	LAION-2B + Photo Concept Bucket + PD12M, native 4K
val	2,000	image + caption	held-out, balanced across categories
test	1,984	image + caption + paired LR/HR	for both classical and real-world SR benchmark

Categories covered: nature, urban scenes, people, food, artwork, CGI, animals, architecture.

🧱 Pre-trained Models

All 4KLSDB-fine-tuned checkpoints live alongside the dataset under SingleBicycle/4KLSDB/ckpts/.

Family	Model	Path on Hub	Best for
Classical SR	HiT-SR	`ckpts/hit_sr/`	×4 / ×8 / ×16 PSNR/SSIM
Classical SR	SwinIR	`ckpts/swinir/`	×4 / ×8 / ×16 PSNR/SSIM
Classical SR	MambaIR	`ckpts/mambair/`	strongest classical SR
Real-World SR	OSEDiff	`ckpts/osediff/x4/`	one-step diffusion SR
Real-World SR	SeeSR	`ckpts/seesr/`	semantics-aware Real-SR
4K T2I Generation	Sana 4096²	`ckpts/sana/`	native 4096×4096 T2I

One-shot download of every model:

bash scripts/download_all_ckpts.sh        # → release_ckpts/<model>/

New (May 2026): the dataset now ships an authoritative metadata.jsonl with Qwen2.5-VL-7B recaptions for all 129,484 training images. Use this for T2I fine-tuning instead of the older caption column in the parquet shards.

🚀 Quick Start

Environment

We strongly recommend using a separate conda environment per model family.

# Classical SR & Real-SR (HiT-SR / SwinIR / MambaIR / OSEDiff / SeeSR)
conda env create -f envs/4k_sr.yml
conda activate 4k_sr

# 4K T2I generation (Sana)
conda env create -f envs/Sana_training.yml
conda activate Sana

Download Everything (one script)

bash scripts/download_all_ckpts.sh                # → release_ckpts/
huggingface-cli download SingleBicycle/4KLSDB \
    --repo-type=dataset --local-dir ./data/4KLSDB

Classical SR Inference

bash scripts/inference_classical_sr.sh \
    --model hit_sr          \   # or swinir / mambair
    --scale 4               \   # 4, 8, or 16
    --input  data/4KLSDB/test/LR_x4 \
    --output results/hit_sr_x4

Real-World SR Inference

bash scripts/inference_real_sr.sh \
    --model seesr           \   # or osediff
    --scale 4               \
    --input  data/4KLSDB/test/LR_real_x4 \
    --output results/seesr_x4

4K T2I Inference

bash scripts/inference_sana_4k.sh \
    --prompt "A serene mountain lake at sunrise, 4K, photorealistic" \
    --resolution 4096 \
    --output results/sana_4k.png

Run bash scripts/<name>.sh --help for the full list of options on any script.

🏋️ Training

Each model lives as a submodule under models/<name> with its own training entry-point. The shell scripts below wrap the upstream configs and inject 4KLSDB-specific paths so a single command reproduces the paper.

# Classical SR
bash scripts/train_hit_sr.sh   --scale 4   --data data/4KLSDB
bash scripts/train_swinir.sh   --scale 8   --data data/4KLSDB
bash scripts/train_mambair.sh  --scale 16  --data data/4KLSDB

# Real-World SR (blind degradation pipeline)
bash scripts/train_osediff.sh  --scale 4   --data data/4KLSDB
bash scripts/train_seesr.sh    --scale 4   --data data/4KLSDB

# 4K T2I
bash scripts/train_sana_4k.sh  --resolution 4096  --data data/4KLSDB

Detailed per-model docs:

models/sana/README.md — Sana 4K fine-tuning + Gemma-2 caption embedding pre-compute.
dataset/README.md — curation pipeline (resolution / Q-Align / Laplacian / Sobel / manual review).

📊 Benchmark Results

Classical Super-Resolution on 4KLSDB Test Set

Model	×4 PSNR / SSIM	×8 PSNR / SSIM	×16 PSNR / SSIM
HiT-SR (pretrained)	24.50 / 0.6839	22.25 / 0.6394	19.47 / 0.5741
HiT-SR (4KLSDB)	29.27 / 0.7896	24.75 / 0.6928	23.69 / 0.6414
SwinIR (DF2K)	24.11 / 0.6738	20.96 / 0.5915	19.20 / 0.5684
SwinIR (4KLSDB)	28.79 / 0.7774	25.89 / 0.6877	23.69 / 0.6376
MambaIR (pretrained)	25.92 / 0.7259	21.51 / 0.6382	19.47 / 0.5741
MambaIR (4KLSDB)	30.92 / 0.8216	23.84 / 0.7195	23.69 / 0.6414

Real-World SR (4KLSDB Test Set, baseline / ours)

Method	Scale	PSNR↑	SSIM↑	LPIPS↓	DISTS↓	FID↓
OSEDiff	×4	27.36 / 27.50	0.7511 / 0.7568	0.2863 / 0.2546	0.1604 / 0.1431	28.07 / 28.35
OSEDiff	×8	23.86 / 24.10	0.6021 / 0.6188	0.5463 / 0.4252	0.1833 / 0.1448	19.56 / 17.74
OSEDiff	×16	22.65 / 22.69	0.6213 / 0.5966	0.6571 / 0.4866	0.2861 / 0.2170	51.76 / 33.97
SeeSR	×4	27.01 / 28.25	0.6996 / 0.7340	0.5231 / 0.4511	0.1407 / 0.1272	38.95 / 33.88
SeeSR	×8	24.10 / 24.50	0.6510 / 0.6713	0.5117 / 0.4628	0.1607 / 0.1551	77.46 / 74.46
SeeSR	×16	24.02 / 24.43	0.6810 / 0.7001	0.5594 / 0.5197	0.1699 / 0.1640	77.41 / 74.40

4K Text-to-Image Generation (Sana)

Model	pCLIPScore↑	pNIQE↓
Sana (baseline)	28.62	5.21
Sana + 4KLSDB	29.27	4.63

Double-blind user study win rate of Sana + 4KLSDB over Sana: 57.3% overall, 60.9% detail, 74.3% realism, 64.4% fewer artifacts, 52.3% alignment.

🗂 Repository Structure

4KLSDB/
├── README.md                     # this file
├── docs/                         # GitHub Pages project page (index.html)
│   ├── index.html                # https://4klsdb.github.io/
│   └── assets/                   # teaser & figure JPGs used by the project page
├── envs/
│   ├── 4k_sr.yml                 # classical SR + real-SR (HiT-SR / SwinIR / MambaIR / OSEDiff / SeeSR)
│   ├── Sana_training.yml         # 4K T2I (Sana)
│   └── 4k_data_curation.yml      # dataset filtering / Q-Align scoring
├── dataset/
│   ├── README.md                 # dataset curation pipeline doc
│   ├── preprocessing/            # Q-Align / Laplacian / Sobel filters
│   └── validation/               # manual inspection Flask app
├── models/
│   ├── README.md
│   ├── sana/                     # 4K T2I submodule (NVlabs/Sana)
│   ├── hit_sr/                   # → submodule placeholder
│   ├── swinir/                   # → submodule placeholder
│   ├── mambair/                  # → submodule placeholder
│   ├── seesr/                    # → submodule placeholder
│   └── osediff/                  # → submodule placeholder
├── scripts/                      # one-click inference / training / download wrappers
│   ├── download_all_ckpts.sh
│   ├── inference_classical_sr.sh
│   ├── inference_real_sr.sh
│   ├── inference_sana_4k.sh
│   ├── train_hit_sr.sh
│   ├── train_swinir.sh
│   ├── train_mambair.sh
│   ├── train_osediff.sh
│   ├── train_seesr.sh
│   └── train_sana_4k.sh
└── LICENSE

📝 Citation

If you find 4KLSDB useful for your research, please cite:

@misc{zhu20264klsdblargescaledataset4k,
      title={4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation}, 
      author={Zihao Zhu and Kuan-Ru Huang and Zhaoming Xu and Renjie Li and Bo Wu and Ruizheng Bai and Mingyang Wu and Sayak Paul and Zhengzhong Tu},
      year={2026},
      eprint={2605.24762},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.24762}, 
}

🙏 Acknowledgements

Our work builds on a number of excellent open-source projects:

HiT-SR, SwinIR, MambaIR
OSEDiff, SeeSR
Sana
Q-Align
Image sources: LAION-2B, Photo Concept Bucket, PD12M

The project page is adapted from the SparkVSR template.

⚖️ License

The code in this repository is released under the MIT License. The 4KLSDB dataset is released for research purposes only; please refer to the dataset card for the full terms and source-dataset licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation

TL;DR

📰 News

📑 Table of Contents

✨ Highlights

📦 Dataset

🧱 Pre-trained Models

🚀 Quick Start

Environment

Download Everything (one script)

Classical SR Inference

Real-World SR Inference

4K T2I Inference

🏋️ Training

📊 Benchmark Results

Classical Super-Resolution on 4KLSDB Test Set

Real-World SR (4KLSDB Test Set, baseline / ours)

4K Text-to-Image Generation (Sana)

🗂 Repository Structure

📝 Citation

🙏 Acknowledgements

⚖️ License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
dataset		dataset
docs		docs
envs		envs
models		models
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation

TL;DR

📰 News

📑 Table of Contents

✨ Highlights

📦 Dataset

🧱 Pre-trained Models

🚀 Quick Start

Environment

Download Everything (one script)

Classical SR Inference

Real-World SR Inference

4K T2I Inference

🏋️ Training

📊 Benchmark Results

Classical Super-Resolution on 4KLSDB Test Set

Real-World SR (4KLSDB Test Set, baseline / ours)

4K Text-to-Image Generation (Sana)

🗂 Repository Structure

📝 Citation

🙏 Acknowledgements

⚖️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages