GitHub - Arlo0o/UniScene-Unified-Occupancy-centric-Driving-Scene-Generation: [CVPR 2025] UniScene: Unified Occupancy-centric Driving Scene Generation

UniScene: Unified Occupancy-centric Driving Scene Generation [CVPR 2025]

🎯 Demo:

(a) Overview of UniScene. Given BEV layouts, UniScene facilitates versatile data generation, including semantic occupancy, multi-view video, and LiDAR point clouds, through an occupancy-centric hierarchical modeling approach. (b) Performance comparison on different generation tasks. UniScene delivers substantial improvements over SOTA methods in video, LiDAR, and occupancy generation.

📋 Abstract:

TL; DR The first unified framework for generating three key data forms — semantic occupancy, video, and LiDAR — in driving scenes.

Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles to model the direct layout-to-data distribution. In this paper, we introduce UniScene, the first unified framework for generating three key data forms — semantic occupancy, video, and LiDAR — in driving scenes. UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling. This occupancy-centric approach reduces the generation burden, especially for intricate scenes, while providing detailed intermediate representations for the subsequent generation stages. Extensive experiments demonstrate that UniScene outperforms previous SOTAs in the occupancy, video, and LiDAR generation, which also indeed benefits downstream driving tasks.

📚 Framework:

💥 News

[2025/03]: Check out our other latest works on generative world models: MuDG, DiST-4D, HERMES.
[2025/03]: Code and pre-trained weights are released.
[2025/02]: Paper is accepted on CVPR 2025.
[2024/12]: Paper is on the arxiv.
[2024/12]: Demo is released on the Project Page.

🕹️ Getting Started

1. Environment Setup

a. We recommend to create environment using the following script with "poetry", which have been tested on NVIDIA A100/A800 with CUDA 12.1 and Python 3.9:

Step-by-step Installation.

b. Pretrained Models. The huggingface-cli tool is available with:

pip install -U huggingface_hub 
huggingface-cli download --resume-download Arlolo0/UniScene_path  --local-dir $local_path

Model	OneDrive
Occupancy Model	Occuancy-OneDrive	Occ-VAE Occ-DiT
LiDAR Model	LiDAR-OneDrive	LiDAR-HF
Video Model	Video-OneDrive	Video-HF

2. Data Preparation

a. Download all splits of Trainval in Full dataset (v1.0) to your device following official instructions and put them to local folder "./data/nuscenes".

$./data/nuscenes
├── samples
├── sweeps
├── ...
└── v1.0-trainval

b. Downloaded interpolated 12Hz annotation and mmdet3d meta data on "nuscenes_interp_12Hz_infos_*.pkl" and put them to "data/nuscenes_mmdet3d-12Hz/".

c. (Optional) Prepare 12HZ 3D Occupancy Labels with LiDAR and 3D bbox labels. Note that we use NKSR reconstruction to produce GT occupancy with the pc-range of [-50, -50, -5, 50, 50, 3], which is aligned with OpenOccupancy. The pc-range of the generated occupancy from our occupancy model is set to [-40, -40, -1, 40, 40, 5.4] for Occ3D baseline comparisons.

To generate ground-truth semantic occupancy with resolution of 200×200×16:

cd ./data_process && python generate_occ.py --config_path  config-200.yaml --save_path $GT_nksr_occupancy_200/

To generate ground-truth semantic occupancy with resolution of 800×800×64:

cd ./data_process && python generate_occ.py --config_path  config-800.yaml --save_path $GT_nksr_occupancy_800/

Note that this will take up about 3.1 TB of hard disk space.

d. (Optional) Prepare 12HZ BEV layouts.

cd ./occupancy_gen/12hz_processing && python save_bevlayout_12hz.py  \
    --py-config="./config/save_step2_me.py" \
    --work-dir="./ckpt/VAE/"

Note that the road lines from the BEV layouts are projected onto the semantic occupancy, integrating the corresponding semantic information. For more details, please refer to our UniScene paper.

3. Multi-modal Generation

Overall running instruction:

You can link the "./data" folder to the following subfolders for convenient access:

ln -s ./data   $sub_folder

a. Our framework starts with generating semantic occupancy from given BEV layouts as:

cd ./occupancy_gen
python save_bev_layout.py \
    --py-config="./config/save_step2_me.py" \
    --work-dir="./ckpt/VAE/" 
bash ./run_eval_dit.sh

b. Generating LiDAR point clouds from semantic occupancy as:

cd ./lidar_gen
python tools/test.py --save_to_file  $output_lidar_path

c. Generating multi-view video from semantic occupancy as:

cd ./video_gen
python  ./gs_render/render_eval_condition_gt.py  --occ_path  $input_occ_path  --layout_path $input_bev_path --render_path $output_render_path  --vis
python  inference_video.py  --occ_data_root $output_render_path   --save  $output_video_path

More details on Occupancy, LIDAR, and Video generation. We leverage Mayavi for 3D occupancy and LiDAR point clouds visualization:
- Occuancy Generation. Please refer to Occupancy_vis for visualization.
- LiDAR Generation. Please refer to LiDAR_vis for visualization.
- Video Generation. The generated video will be saved at "./video_gen/outputs".

❤️Acknowledgements

Our implementation is based on the excellent open source projects: Occworld, OpenPCDet, SVD.

📜 License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

⭐ Citation

If you find our paper and code useful for your research, please consider citing us and giving a star to our repository:

@article{li2024uniscene,
  title={UniScene: Unified Occupancy-centric Driving Scene Generation},
  author={Li, Bohan and Guo, Jiazhe and Liu, Hongsi and Zou, Yingshuang and Ding, Yikang and Chen, Xiwu and Zhu, Hu and Tan, Feiyang and Zhang, Chi and Wang, Tiancai and others},
  journal={arXiv preprint arXiv:2412.05435},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
assets		assets
data_process		data_process
lidar_gen		lidar_gen
occupancy_gen		occupancy_gen
third_party		third_party
video_gen		video_gen
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
install.md		install.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UniScene: Unified Occupancy-centric Driving Scene Generation [CVPR 2025]

🎯 Demo:

📋 Abstract:

📚 Framework:

💥 News

🕹️ Getting Started

1. Environment Setup

2. Data Preparation

3. Multi-modal Generation

❤️Acknowledgements

📜 License

⭐ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

Arlo0o/UniScene-Unified-Occupancy-centric-Driving-Scene-Generation

Folders and files

Latest commit

History

Repository files navigation

UniScene: Unified Occupancy-centric Driving Scene Generation [CVPR 2025]

🎯 Demo:

📋 Abstract:

📚 Framework:

💥 News

🕹️ Getting Started

1. Environment Setup

2. Data Preparation

3. Multi-modal Generation

❤️Acknowledgements

📜 License

⭐ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages