[CVPR 2025] EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
Dong In Lee1, Hyeongcheol Park1, Jiyoung Seo1, Eunbyung Park2,
Hyunje Park1, Ha Dam Baek1, Sangheon Shin3, Sangmin Kim3, Sangpil Kim1†
1Korea University, 2Yonsei University, 3Hanwha Systems
Tested on Ubuntu 22.04 + CUDA 11.8 + Python 3.9 (RTX A6000 / RTX 3090).
Note: The GPU memory requirement depends on your dataset size.
conda env create -f environment.yaml
conda activate editsplat
We provide datasets and pretrained weights for all scenes presented in our paper, allowing users to easily reproduce our results and experiment further.
- 📥 Download: Google Drive
After downloading, move the dataset into the cvpr25_EditSplat/dataset/ directory.
If you want to edit your own dataset, you must first pre-train a 3D Gaussian Splatting (3DGS) model from your custom dataset using COLMAP for camera poses.
To run the editing pipeline:
./script/editing_face_to_marble_sculpture.sh
The edited 3D Gaussian Splatting outputs will be saved under cvpr25_EditSplat/output
.
You can render custom novel views from the updated 3D scene stored in cvpr25_EditSplat/output/point_cloud/
.
💻 Command Line Arguments for editing
python run_editing.py -s ./dataset/dataset/face -m output/face_to_marble_sculpture --source_checkpoint ./dataset/pretrained/face/chkpnt30000.pth --object_prompt "face" --target_prompt "Make his face resemble that of a marble sculpture" --sampling_prompt "a photo of a marble sculpture" --target_mask_prompt "face"
Path to the source directory containing a COLMAP.
Path to the pretrained 3D Gaussian Splatting (3DGS) checkpoint (.pth) you wish to edit. Example: ./dataset/<scene_name>/chkpnt30000.pth
Path where the edited model should be stored (output/ by default).
A text instruction describing the desired edit, written in the format compatible with InstructPix2Pix.
The object keyword contained in the target_prompt
.
This is used in Attention-Guided Trimming (AGT) to extract cross-attention maps from the diffusion model and assign them to the pretrained 3DGS for local editing and pruning.
A sentence that describes the expected result after editing. The ImageReward model uses this prompt to rank the initially edited images and filter out the bottom 15% with the lowest scores before projection.
An object class name (e.g., “marble sculpture”, “wildboar”) representing the expected object after editing. Used in Multi-View Fusion Guidance (MFG) to generate a segmentation mask via the SAM model. It doesn’t need to appear in the target_prompt. The mask guides background replacement with content from the source dataset.
Number of total iterations to edit for, 30_000 by default.
Add this flag to use a MipNeRF360-style training/test split for evaluation.
Note that, similar to other baselines, we use images with a resolution of 512×512, as required by the InstructPix2Pix model.
To produce your own edited results, you can maximize performance by tuning the following hyperparameters. Defaults are set according to those used in the main paper.
🛠️ Hyperparameter Details
Number of epochs to optimize the edited 3D Gaussian Splatting. Default: 10
Cross-attention threshold w_thres
.
A higher value leads to tighter localization but may overly restrict the editable region.
Default: 0.1
Pruning proportion k
for the first densification step.
A high value may remove too many Gaussians, degrading editing quality.
Default: 0.15
Weight sT
for the text guidance in the diffusion model.
Higher values enforce stronger adherence to the instruction prompt.
Default: 7.5
Weight sM
for the multi-view fusion guidance.
Controls the contribution of multi-view information (hM
) in the editing.
Default: 1.0
Weight sS
for the original source image guidance.
Helps preserve original source information in editing.
Default: 0.5
Filters out the bottom (1 - filtering_ratio)% of initial edited views based on ImageReward scores, before projection in MFG. Default: 0.15
For additional 3DGS-specific hyperparameters such as feature_lr
, opacity_lr
, scaling_lr
, rotation_lr
, etc.
please refer to the official 3D Gaussian Splatting repository.
We provide several convenient rendering options for visualizing your edited 3D Gaussian Splatting (3DGS) models.
Generate novel view videos and GIF animations:
python render.py --model_path output/face_to_marble_sculpture --iteration 30560 --video
The resulting videos and GIFs are saved under:
output/face_to_marble_sculpture/video/ours_30560/
├── final_video.mp4
└── final_video.gif
If you find our work useful, please consider citing:
@InProceedings{Lee_2025_CVPR,
author = {Lee, Dong In and Park, Hyeongcheol and Seo, Jiyoung and Park, Eunbyung and Park, Hyunje and Baek, Ha Dam and Shin, Sangheon and Kim, Sangmin and Kim, Sangpil},
title = {EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {11135-11145}
}
Our code is based on these wonderful repos: