DataCV @ CVPR 2026 Β· Accepted π
Zihao Zhu1, Kuan-Ru Huang1, Zhaoming Xu1, Renjie Li1, Bo Wu1, Ruizheng Bai1, Mingyang Wu1, Sayak Paul2, Zhengzhong Tuβ ,1
1Texas A&M University Β Β 2Hugging Face Β Β
4KLSDB is the first openly released native-4K image dataset that scales to 129k+ training images and is designed for both image restoration and generation research. Every model in our paper β HiT-SR, SwinIR, MambaIR, OSEDiff, SeeSR, and Sana β gets a consistent and substantial boost when fine-tuned on 4KLSDB.
- πΌοΈ Dataset: 129,484 train / 2,000 val / 1,984 test, all native-4K, with captions, on π€ Hugging Face.
- π§± Pre-trained checkpoints: every SR/T2I model released under π€ taco-group.
- π One-click inference: ready-to-run scripts for each model under
scripts/. - ποΈ One-click training: reproducible YAML configs and shell scripts under
models/.
- 2026-05 β Public release of dataset, code, and pretrained weights.
- 2026-05 β Paper released on arXiv.
- Highlights
- Dataset
- Pre-trained Models
- Quick Start
- Training
- Benchmark Results
- Repository Structure
- Citation
- Acknowledgements
- License
- Native 4K β every image meets a minimum dimension of 3840 px and a 3840 Γ 2160 pixel budget.
- Scale β 129,484 training images, 22Γ larger than DIV8K, 150Γ larger than DIV2K.
- Quality pipeline β Q-Align aesthetic scoring + Laplacian/Sobel texture filtering + two human annotators per image.
- Dual-purpose β works out-of-the-box for classical SR (Γ4 / Γ8 / Γ16), real-world SR, and 4K T2I generation.
- Reproducibility β all training configs, blind-degradation pipeline, and evaluation code are included.
The 4KLSDB dataset is hosted on Hugging Face:
from datasets import load_dataset
# Streaming (recommended β the train split is ~1.5 TB)
ds = load_dataset("SingleBicycle/4KLSDB", split="train", streaming=True)
for ex in ds:
print(ex["image"], ex["caption"])
break| Split | #Images | Format | Notes |
|---|---|---|---|
| train | 129,484 | image + caption | LAION-2B + Photo Concept Bucket + PD12M, native 4K |
| val | 2,000 | image + caption | held-out, balanced across categories |
| test | 1,984 | image + caption + paired LR/HR | for both classical and real-world SR benchmark |
Categories covered: nature, urban scenes, people, food, artwork, CGI, animals, architecture.
All 4KLSDB-fine-tuned checkpoints live alongside the dataset under
SingleBicycle/4KLSDB/ckpts/.
| Family | Model | Path on Hub | Best for |
|---|---|---|---|
| Classical SR | HiT-SR | ckpts/hit_sr/ |
Γ4 / Γ8 / Γ16 PSNR/SSIM |
| Classical SR | SwinIR | ckpts/swinir/ |
Γ4 / Γ8 / Γ16 PSNR/SSIM |
| Classical SR | MambaIR | ckpts/mambair/ |
strongest classical SR |
| Real-World SR | OSEDiff | ckpts/osediff/x4/ |
one-step diffusion SR |
| Real-World SR | SeeSR | ckpts/seesr/ |
semantics-aware Real-SR |
| 4K T2I Generation | Sana 4096Β² | ckpts/sana/ |
native 4096Γ4096 T2I |
One-shot download of every model:
bash scripts/download_all_ckpts.sh # β release_ckpts/<model>/
New (May 2026): the dataset now ships an authoritative
metadata.jsonlwith Qwen2.5-VL-7B recaptions for all 129,484 training images. Use this for T2I fine-tuning instead of the oldercaptioncolumn in the parquet shards.
We strongly recommend using a separate conda environment per model family.
# Classical SR & Real-SR (HiT-SR / SwinIR / MambaIR / OSEDiff / SeeSR)
conda env create -f envs/4k_sr.yml
conda activate 4k_sr
# 4K T2I generation (Sana)
conda env create -f envs/Sana_training.yml
conda activate Sanabash scripts/download_all_ckpts.sh # β release_ckpts/
huggingface-cli download SingleBicycle/4KLSDB \
--repo-type=dataset --local-dir ./data/4KLSDBbash scripts/inference_classical_sr.sh \
--model hit_sr \ # or swinir / mambair
--scale 4 \ # 4, 8, or 16
--input data/4KLSDB/test/LR_x4 \
--output results/hit_sr_x4bash scripts/inference_real_sr.sh \
--model seesr \ # or osediff
--scale 4 \
--input data/4KLSDB/test/LR_real_x4 \
--output results/seesr_x4bash scripts/inference_sana_4k.sh \
--prompt "A serene mountain lake at sunrise, 4K, photorealistic" \
--resolution 4096 \
--output results/sana_4k.pngRun bash scripts/<name>.sh --help for the full list of options on any script.
Each model lives as a submodule under models/<name> with its own training entry-point. The shell scripts below wrap the upstream configs and inject 4KLSDB-specific paths so a single command reproduces the paper.
# Classical SR
bash scripts/train_hit_sr.sh --scale 4 --data data/4KLSDB
bash scripts/train_swinir.sh --scale 8 --data data/4KLSDB
bash scripts/train_mambair.sh --scale 16 --data data/4KLSDB
# Real-World SR (blind degradation pipeline)
bash scripts/train_osediff.sh --scale 4 --data data/4KLSDB
bash scripts/train_seesr.sh --scale 4 --data data/4KLSDB
# 4K T2I
bash scripts/train_sana_4k.sh --resolution 4096 --data data/4KLSDBDetailed per-model docs:
models/sana/README.mdβ Sana 4K fine-tuning + Gemma-2 caption embedding pre-compute.dataset/README.mdβ curation pipeline (resolution / Q-Align / Laplacian / Sobel / manual review).
| Model | Γ4 PSNR / SSIM | Γ8 PSNR / SSIM | Γ16 PSNR / SSIM |
|---|---|---|---|
| HiT-SR (pretrained) | 24.50 / 0.6839 | 22.25 / 0.6394 | 19.47 / 0.5741 |
| HiT-SR (4KLSDB) | 29.27 / 0.7896 | 24.75 / 0.6928 | 23.69 / 0.6414 |
| SwinIR (DF2K) | 24.11 / 0.6738 | 20.96 / 0.5915 | 19.20 / 0.5684 |
| SwinIR (4KLSDB) | 28.79 / 0.7774 | 25.89 / 0.6877 | 23.69 / 0.6376 |
| MambaIR (pretrained) | 25.92 / 0.7259 | 21.51 / 0.6382 | 19.47 / 0.5741 |
| MambaIR (4KLSDB) | 30.92 / 0.8216 | 23.84 / 0.7195 | 23.69 / 0.6414 |
| Method | Scale | PSNRβ | SSIMβ | LPIPSβ | DISTSβ | FIDβ |
|---|---|---|---|---|---|---|
| OSEDiff | Γ4 | 27.36 / 27.50 | 0.7511 / 0.7568 | 0.2863 / 0.2546 | 0.1604 / 0.1431 | 28.07 / 28.35 |
| OSEDiff | Γ8 | 23.86 / 24.10 | 0.6021 / 0.6188 | 0.5463 / 0.4252 | 0.1833 / 0.1448 | 19.56 / 17.74 |
| OSEDiff | Γ16 | 22.65 / 22.69 | 0.6213 / 0.5966 | 0.6571 / 0.4866 | 0.2861 / 0.2170 | 51.76 / 33.97 |
| SeeSR | Γ4 | 27.01 / 28.25 | 0.6996 / 0.7340 | 0.5231 / 0.4511 | 0.1407 / 0.1272 | 38.95 / 33.88 |
| SeeSR | Γ8 | 24.10 / 24.50 | 0.6510 / 0.6713 | 0.5117 / 0.4628 | 0.1607 / 0.1551 | 77.46 / 74.46 |
| SeeSR | Γ16 | 24.02 / 24.43 | 0.6810 / 0.7001 | 0.5594 / 0.5197 | 0.1699 / 0.1640 | 77.41 / 74.40 |
| Model | pCLIPScoreβ | pNIQEβ |
|---|---|---|
| Sana (baseline) | 28.62 | 5.21 |
| Sana + 4KLSDB | 29.27 | 4.63 |
Double-blind user study win rate of Sana + 4KLSDB over Sana: 57.3% overall, 60.9% detail, 74.3% realism, 64.4% fewer artifacts, 52.3% alignment.
4KLSDB/
βββ README.md # this file
βββ docs/ # GitHub Pages project page (index.html)
β βββ index.html # https://4klsdb.github.io/
β βββ assets/ # teaser & figure JPGs used by the project page
βββ envs/
β βββ 4k_sr.yml # classical SR + real-SR (HiT-SR / SwinIR / MambaIR / OSEDiff / SeeSR)
β βββ Sana_training.yml # 4K T2I (Sana)
β βββ 4k_data_curation.yml # dataset filtering / Q-Align scoring
βββ dataset/
β βββ README.md # dataset curation pipeline doc
β βββ preprocessing/ # Q-Align / Laplacian / Sobel filters
β βββ validation/ # manual inspection Flask app
βββ models/
β βββ README.md
β βββ sana/ # 4K T2I submodule (NVlabs/Sana)
β βββ hit_sr/ # β submodule placeholder
β βββ swinir/ # β submodule placeholder
β βββ mambair/ # β submodule placeholder
β βββ seesr/ # β submodule placeholder
β βββ osediff/ # β submodule placeholder
βββ scripts/ # one-click inference / training / download wrappers
β βββ download_all_ckpts.sh
β βββ inference_classical_sr.sh
β βββ inference_real_sr.sh
β βββ inference_sana_4k.sh
β βββ train_hit_sr.sh
β βββ train_swinir.sh
β βββ train_mambair.sh
β βββ train_osediff.sh
β βββ train_seesr.sh
β βββ train_sana_4k.sh
βββ LICENSE
If you find 4KLSDB useful for your research, please cite:
@misc{zhu20264klsdblargescaledataset4k,
title={4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation},
author={Zihao Zhu and Kuan-Ru Huang and Zhaoming Xu and Renjie Li and Bo Wu and Ruizheng Bai and Mingyang Wu and Sayak Paul and Zhengzhong Tu},
year={2026},
eprint={2605.24762},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.24762},
}Our work builds on a number of excellent open-source projects:
- HiT-SR, SwinIR, MambaIR
- OSEDiff, SeeSR
- Sana
- Q-Align
- Image sources: LAION-2B, Photo Concept Bucket, PD12M
The project page is adapted from the SparkVSR template.
The code in this repository is released under the MIT License. The 4KLSDB dataset is released for research purposes only; please refer to the dataset card for the full terms and source-dataset licenses.