Skip to content

taco-group/4KLSDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation

DataCV @ CVPR 2026 Β· Accepted πŸŽ‰

arXiv Project Page Dataset Checkpoints License

Zihao Zhu1, Kuan-Ru Huang1, Zhaoming Xu1, Renjie Li1, Bo Wu1, Ruizheng Bai1, Mingyang Wu1, Sayak Paul2, Zhengzhong Tu†,1

1Texas A&M University Β Β  2Hugging Face Β Β 

4KLSDB teaser

TL;DR

4KLSDB is the first openly released native-4K image dataset that scales to 129k+ training images and is designed for both image restoration and generation research. Every model in our paper β€” HiT-SR, SwinIR, MambaIR, OSEDiff, SeeSR, and Sana β€” gets a consistent and substantial boost when fine-tuned on 4KLSDB.

  • πŸ–ΌοΈ Dataset: 129,484 train / 2,000 val / 1,984 test, all native-4K, with captions, on πŸ€— Hugging Face.
  • 🧱 Pre-trained checkpoints: every SR/T2I model released under πŸ€— taco-group.
  • πŸš€ One-click inference: ready-to-run scripts for each model under scripts/.
  • πŸ‹οΈ One-click training: reproducible YAML configs and shell scripts under models/.

πŸ“° News

  • 2026-05 β€” Public release of dataset, code, and pretrained weights.
  • 2026-05 β€” Paper released on arXiv.

πŸ“‘ Table of Contents


✨ Highlights

  • Native 4K β€” every image meets a minimum dimension of 3840 px and a 3840 Γ— 2160 pixel budget.
  • Scale β€” 129,484 training images, 22Γ— larger than DIV8K, 150Γ— larger than DIV2K.
  • Quality pipeline β€” Q-Align aesthetic scoring + Laplacian/Sobel texture filtering + two human annotators per image.
  • Dual-purpose β€” works out-of-the-box for classical SR (Γ—4 / Γ—8 / Γ—16), real-world SR, and 4K T2I generation.
  • Reproducibility β€” all training configs, blind-degradation pipeline, and evaluation code are included.

πŸ“¦ Dataset

The 4KLSDB dataset is hosted on Hugging Face:

https://huggingface.co/datasets/SingleBicycle/4KLSDB

from datasets import load_dataset

# Streaming (recommended β€” the train split is ~1.5 TB)
ds = load_dataset("SingleBicycle/4KLSDB", split="train", streaming=True)
for ex in ds:
    print(ex["image"], ex["caption"])
    break
Split #Images Format Notes
train 129,484 image + caption LAION-2B + Photo Concept Bucket + PD12M, native 4K
val 2,000 image + caption held-out, balanced across categories
test 1,984 image + caption + paired LR/HR for both classical and real-world SR benchmark

Categories covered: nature, urban scenes, people, food, artwork, CGI, animals, architecture.


🧱 Pre-trained Models

All 4KLSDB-fine-tuned checkpoints live alongside the dataset under SingleBicycle/4KLSDB/ckpts/.

Family Model Path on Hub Best for
Classical SR HiT-SR ckpts/hit_sr/ Γ—4 / Γ—8 / Γ—16 PSNR/SSIM
Classical SR SwinIR ckpts/swinir/ Γ—4 / Γ—8 / Γ—16 PSNR/SSIM
Classical SR MambaIR ckpts/mambair/ strongest classical SR
Real-World SR OSEDiff ckpts/osediff/x4/ one-step diffusion SR
Real-World SR SeeSR ckpts/seesr/ semantics-aware Real-SR
4K T2I Generation Sana 4096Β² ckpts/sana/ native 4096Γ—4096 T2I

One-shot download of every model:

bash scripts/download_all_ckpts.sh        # β†’ release_ckpts/<model>/

New (May 2026): the dataset now ships an authoritative metadata.jsonl with Qwen2.5-VL-7B recaptions for all 129,484 training images. Use this for T2I fine-tuning instead of the older caption column in the parquet shards.


πŸš€ Quick Start

Environment

We strongly recommend using a separate conda environment per model family.

# Classical SR & Real-SR (HiT-SR / SwinIR / MambaIR / OSEDiff / SeeSR)
conda env create -f envs/4k_sr.yml
conda activate 4k_sr

# 4K T2I generation (Sana)
conda env create -f envs/Sana_training.yml
conda activate Sana

Download Everything (one script)

bash scripts/download_all_ckpts.sh                # β†’ release_ckpts/
huggingface-cli download SingleBicycle/4KLSDB \
    --repo-type=dataset --local-dir ./data/4KLSDB

Classical SR Inference

bash scripts/inference_classical_sr.sh \
    --model hit_sr          \   # or swinir / mambair
    --scale 4               \   # 4, 8, or 16
    --input  data/4KLSDB/test/LR_x4 \
    --output results/hit_sr_x4

Real-World SR Inference

bash scripts/inference_real_sr.sh \
    --model seesr           \   # or osediff
    --scale 4               \
    --input  data/4KLSDB/test/LR_real_x4 \
    --output results/seesr_x4

4K T2I Inference

bash scripts/inference_sana_4k.sh \
    --prompt "A serene mountain lake at sunrise, 4K, photorealistic" \
    --resolution 4096 \
    --output results/sana_4k.png

Run bash scripts/<name>.sh --help for the full list of options on any script.


πŸ‹οΈ Training

Each model lives as a submodule under models/<name> with its own training entry-point. The shell scripts below wrap the upstream configs and inject 4KLSDB-specific paths so a single command reproduces the paper.

# Classical SR
bash scripts/train_hit_sr.sh   --scale 4   --data data/4KLSDB
bash scripts/train_swinir.sh   --scale 8   --data data/4KLSDB
bash scripts/train_mambair.sh  --scale 16  --data data/4KLSDB

# Real-World SR (blind degradation pipeline)
bash scripts/train_osediff.sh  --scale 4   --data data/4KLSDB
bash scripts/train_seesr.sh    --scale 4   --data data/4KLSDB

# 4K T2I
bash scripts/train_sana_4k.sh  --resolution 4096  --data data/4KLSDB

Detailed per-model docs:

  • models/sana/README.md β€” Sana 4K fine-tuning + Gemma-2 caption embedding pre-compute.
  • dataset/README.md β€” curation pipeline (resolution / Q-Align / Laplacian / Sobel / manual review).

πŸ“Š Benchmark Results

Classical Super-Resolution on 4KLSDB Test Set

Model Γ—4 PSNR / SSIM Γ—8 PSNR / SSIM Γ—16 PSNR / SSIM
HiT-SR (pretrained) 24.50 / 0.6839 22.25 / 0.6394 19.47 / 0.5741
HiT-SR (4KLSDB) 29.27 / 0.7896 24.75 / 0.6928 23.69 / 0.6414
SwinIR (DF2K) 24.11 / 0.6738 20.96 / 0.5915 19.20 / 0.5684
SwinIR (4KLSDB) 28.79 / 0.7774 25.89 / 0.6877 23.69 / 0.6376
MambaIR (pretrained) 25.92 / 0.7259 21.51 / 0.6382 19.47 / 0.5741
MambaIR (4KLSDB) 30.92 / 0.8216 23.84 / 0.7195 23.69 / 0.6414

Real-World SR (4KLSDB Test Set, baseline / ours)

Method Scale PSNR↑ SSIM↑ LPIPS↓ DISTS↓ FID↓
OSEDiff Γ—4 27.36 / 27.50 0.7511 / 0.7568 0.2863 / 0.2546 0.1604 / 0.1431 28.07 / 28.35
OSEDiff Γ—8 23.86 / 24.10 0.6021 / 0.6188 0.5463 / 0.4252 0.1833 / 0.1448 19.56 / 17.74
OSEDiff Γ—16 22.65 / 22.69 0.6213 / 0.5966 0.6571 / 0.4866 0.2861 / 0.2170 51.76 / 33.97
SeeSR Γ—4 27.01 / 28.25 0.6996 / 0.7340 0.5231 / 0.4511 0.1407 / 0.1272 38.95 / 33.88
SeeSR Γ—8 24.10 / 24.50 0.6510 / 0.6713 0.5117 / 0.4628 0.1607 / 0.1551 77.46 / 74.46
SeeSR Γ—16 24.02 / 24.43 0.6810 / 0.7001 0.5594 / 0.5197 0.1699 / 0.1640 77.41 / 74.40

4K Text-to-Image Generation (Sana)

Model pCLIPScore↑ pNIQE↓
Sana (baseline) 28.62 5.21
Sana + 4KLSDB 29.27 4.63

Double-blind user study win rate of Sana + 4KLSDB over Sana: 57.3% overall, 60.9% detail, 74.3% realism, 64.4% fewer artifacts, 52.3% alignment.


πŸ—‚ Repository Structure

4KLSDB/
β”œβ”€β”€ README.md                     # this file
β”œβ”€β”€ docs/                         # GitHub Pages project page (index.html)
β”‚   β”œβ”€β”€ index.html                # https://4klsdb.github.io/
β”‚   └── assets/                   # teaser & figure JPGs used by the project page
β”œβ”€β”€ envs/
β”‚   β”œβ”€β”€ 4k_sr.yml                 # classical SR + real-SR (HiT-SR / SwinIR / MambaIR / OSEDiff / SeeSR)
β”‚   β”œβ”€β”€ Sana_training.yml         # 4K T2I (Sana)
β”‚   └── 4k_data_curation.yml      # dataset filtering / Q-Align scoring
β”œβ”€β”€ dataset/
β”‚   β”œβ”€β”€ README.md                 # dataset curation pipeline doc
β”‚   β”œβ”€β”€ preprocessing/            # Q-Align / Laplacian / Sobel filters
β”‚   └── validation/               # manual inspection Flask app
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ sana/                     # 4K T2I submodule (NVlabs/Sana)
β”‚   β”œβ”€β”€ hit_sr/                   # β†’ submodule placeholder
β”‚   β”œβ”€β”€ swinir/                   # β†’ submodule placeholder
β”‚   β”œβ”€β”€ mambair/                  # β†’ submodule placeholder
β”‚   β”œβ”€β”€ seesr/                    # β†’ submodule placeholder
β”‚   └── osediff/                  # β†’ submodule placeholder
β”œβ”€β”€ scripts/                      # one-click inference / training / download wrappers
β”‚   β”œβ”€β”€ download_all_ckpts.sh
β”‚   β”œβ”€β”€ inference_classical_sr.sh
β”‚   β”œβ”€β”€ inference_real_sr.sh
β”‚   β”œβ”€β”€ inference_sana_4k.sh
β”‚   β”œβ”€β”€ train_hit_sr.sh
β”‚   β”œβ”€β”€ train_swinir.sh
β”‚   β”œβ”€β”€ train_mambair.sh
β”‚   β”œβ”€β”€ train_osediff.sh
β”‚   β”œβ”€β”€ train_seesr.sh
β”‚   └── train_sana_4k.sh
└── LICENSE

πŸ“ Citation

If you find 4KLSDB useful for your research, please cite:

@misc{zhu20264klsdblargescaledataset4k,
      title={4KLSDB: A Large-Scale Dataset for 4K Image Restoration and Generation}, 
      author={Zihao Zhu and Kuan-Ru Huang and Zhaoming Xu and Renjie Li and Bo Wu and Ruizheng Bai and Mingyang Wu and Sayak Paul and Zhengzhong Tu},
      year={2026},
      eprint={2605.24762},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.24762}, 
}

πŸ™ Acknowledgements

Our work builds on a number of excellent open-source projects:

The project page is adapted from the SparkVSR template.


βš–οΈ License

The code in this repository is released under the MIT License. The 4KLSDB dataset is released for research purposes only; please refer to the dataset card for the full terms and source-dataset licenses.

About

[CVPR 2026 DataCV Workshop] 4KLSDB: A Large-Scale Native-4K Dataset and Benchmark for Image Restoration and Generation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors