Skip to content

Arlo0o/UniScene-Unified-Occupancy-centric-Driving-Scene-Generation

Repository files navigation

UniScene: Unified Occupancy-centric Driving Scene Generation [CVPR 2025]

arXiv paper Code page Code page Hugging Face

🎯 Demo:

(a) Overview of UniScene. Given BEV layouts, UniScene facilitates versatile data generation, including semantic occupancy, multi-view video, and LiDAR point clouds, through an occupancy-centric hierarchical modeling approach. (b) Performance comparison on different generation tasks. UniScene delivers substantial improvements over SOTA methods in video, LiDAR, and occupancy generation.


📋 Abstract:

TL; DR The first unified framework for generating three key data forms — semantic occupancy, video, and LiDAR — in driving scenes.

Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles to model the direct layout-to-data distribution. In this paper, we introduce UniScene, the first unified framework for generating three key data forms — semantic occupancy, video, and LiDAR — in driving scenes. UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling. This occupancy-centric approach reduces the generation burden, especially for intricate scenes, while providing detailed intermediate representations for the subsequent generation stages. Extensive experiments demonstrate that UniScene outperforms previous SOTAs in the occupancy, video, and LiDAR generation, which also indeed benefits downstream driving tasks.

📚 Framework:

💥 News

  • [2025/03]: Check out our other latest works on generative world models: MuDG, DiST-4D, HERMES.
  • [2025/03]: Code and pre-trained weights are released.
  • [2025/02]: Paper is accepted on CVPR 2025.
  • [2024/12]: Paper is on the arxiv.
  • [2024/12]: Demo is released on the Project Page.

🕹️ Getting Started

1. Environment Setup

a. We recommend to create environment using the following script with "poetry", which have been tested on NVIDIA A100/A800 with CUDA 12.1 and Python 3.9:

Step-by-step Installation.

b. Pretrained Models. The huggingface-cli tool is available with:

pip install -U huggingface_hub 
huggingface-cli download --resume-download Arlolo0/UniScene_path  --local-dir $local_path
Model OneDrive OneDrive Hugging Face
Occupancy Model Occuancy-OneDrive Occ-VAE Occ-DiT
LiDAR Model LiDAR-OneDrive LiDAR-HF
Video Model Video-OneDrive Video-HF

2. Data Preparation

a. Download all splits of Trainval in Full dataset (v1.0) to your device following official instructions and put them to local folder "./data/nuscenes".

$./data/nuscenes
├── samples
├── sweeps
├── ...
└── v1.0-trainval

b. Downloaded interpolated 12Hz annotation and mmdet3d meta data on "nuscenes_interp_12Hz_infos_*.pkl" and put them to "data/nuscenes_mmdet3d-12Hz/".

c. (Optional) Prepare 12HZ 3D Occupancy Labels with LiDAR and 3D bbox labels. Note that we use NKSR reconstruction to produce GT occupancy with the pc-range of [-50, -50, -5, 50, 50, 3], which is aligned with OpenOccupancy. The pc-range of the generated occupancy from our occupancy model is set to [-40, -40, -1, 40, 40, 5.4] for Occ3D baseline comparisons.

To generate ground-truth semantic occupancy with resolution of 200×200×16:

cd ./data_process && python generate_occ.py --config_path  config-200.yaml --save_path $GT_nksr_occupancy_200/

To generate ground-truth semantic occupancy with resolution of 800×800×64:

cd ./data_process && python generate_occ.py --config_path  config-800.yaml --save_path $GT_nksr_occupancy_800/

Note that this will take up about 3.1 TB of hard disk space.

d. (Optional) Prepare 12HZ BEV layouts.

cd ./occupancy_gen/12hz_processing && python save_bevlayout_12hz.py  \
    --py-config="./config/save_step2_me.py" \
    --work-dir="./ckpt/VAE/" 

Note that the road lines from the BEV layouts are projected onto the semantic occupancy, integrating the corresponding semantic information. For more details, please refer to our UniScene paper.

3. Multi-modal Generation

  • Overall running instruction:

    You can link the "./data" folder to the following subfolders for convenient access:

    ln -s ./data   $sub_folder  

    a. Our framework starts with generating semantic occupancy from given BEV layouts as:

    cd ./occupancy_gen
    python save_bev_layout.py \
        --py-config="./config/save_step2_me.py" \
        --work-dir="./ckpt/VAE/" 
    bash ./run_eval_dit.sh

    b. Generating LiDAR point clouds from semantic occupancy as:

    cd ./lidar_gen
    python tools/test.py --save_to_file  $output_lidar_path   

    c. Generating multi-view video from semantic occupancy as:

    cd ./video_gen
    python  ./gs_render/render_eval_condition_gt.py  --occ_path  $input_occ_path  --layout_path $input_bev_path --render_path $output_render_path  --vis
    python  inference_video.py  --occ_data_root $output_render_path   --save  $output_video_path  
  • More details on Occupancy, LIDAR, and Video generation. We leverage Mayavi for 3D occupancy and LiDAR point clouds visualization:

❤️Acknowledgements

Our implementation is based on the excellent open source projects: Occworld, OpenPCDet, SVD.

📜 License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

⭐ Citation

If you find our paper and code useful for your research, please consider citing us and giving a star to our repository:

@article{li2024uniscene,
  title={UniScene: Unified Occupancy-centric Driving Scene Generation},
  author={Li, Bohan and Guo, Jiazhe and Liu, Hongsi and Zou, Yingshuang and Ding, Yikang and Chen, Xiwu and Zhu, Hu and Tan, Feiyang and Zhang, Chi and Wang, Tiancai and others},
  journal={arXiv preprint arXiv:2412.05435},
  year={2024}
}

About

[CVPR 2025] UniScene: Unified Occupancy-centric Driving Scene Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •