Bingnan Li 1* · Chen-Yu Wang 1* · Haiyang Xu 1* · Xiang Zhang 1 · Ethan Armand 1 · Divyansh Srivastava 1 · Xiaojun Shan 1 · Zeyuan Chen 1 · Jianwen Xie 2 · Zhuowen Tu 1
1UC San Diego · 2Lambda, Inc.
Examples from OverLayBench with difficulty increasing from left to right.
Despite steady progress in layout-to-image generation, current methods still struggle with layouts containing significant overlap between bounding boxes. We identify two primary challenges: (1) large overlapping regions and (2) overlapping instances with minimal semantic distinction. Through both qualitative examples and quantitative analysis, we demonstrate how these factors degrade generation quality. To systematically assess this issue, we introduce OverLayScore, a novel metric that quantifies the complexity of overlapping bounding boxes. Our analysis reveals that existing benchmarks are biased toward simpler cases with low OverLayScore values, limiting their effectiveness in evaluating models under more challenging conditions. To reduce this gap, we present OverLayBench, a new benchmark featuring balanced OverLayScore distributions and high-quality annotations. As an initial step toward improved performance on complex overlaps, we also propose CreatiLayout-AM, a model trained on a curated amodal mask dataset. Together, our contributions establish a foundation for more robust layout-to-image generation under realistic and challenging scenarios.
- [2025-06-17]:
If you are using Multi-GPUs, we recommend you to use vllm for accelerated inference.
git clone https://github.com/cuttle-fish-my/OverLayBenchPyTools.git
cd OverLayBenchPyTools
conda create -n overlaybench python=3.10.16 --yes
conda activate overlaybench
bash install_vllm.sh
Otherwise, you may also choose to use the default huggingface transformers, which is slower but more stable.
git clone https://github.com/cuttle-fish-my/OverLayBenchPyTools.git
cd OverLayBenchPyTools
conda create -n overlaybench python=3.10.16 --yes
conda activate overlaybench
bash install.sh
According to the discussion, for vllm inference, please set environment variable VLLM_WORKER_MULTIPROC_METHOD=spawn
before running the code.
Also, please make sure the OverLayBenchMeter
is initialized within if __name__ == "__main__":
block to avoid the RuntimeError: Cannot re-initialize CUDA in forked subprocess
error.
from overlaybenchpytools.meter import OverLayBenchMeter
if __name__ == "__main__":
meter = OverLayBenchMeter(
root='{YOUR_GENERATED_IMAGES_DIR}',
extension='png', save_dir='./metrics',
resolution=1024, bs_qwen="all", use_vllm=True,
vllm_args={"tensor_parallel_size": 8})
for split in ["simple", "medium", "hard"]:
meter.set_split(split, '{YOUR SEED}')
meter.evaluate()
For transformers
based inference, please remove the use_vllm
and the vllm_args
argument and set bs_qwen
to a reasonable size.
from overlaybenchpytools.meter import OverLayBenchMeter
if __name__ == "__main__":
meter = OverLayBenchMeter(
root='{YOUR_GENERATED_IMAGES_DIR}',
extension='png', save_dir='./metrics',
resolution=1024, bs_qwen=8)
for split in ["simple", "medium", "hard"]:
meter.set_split(split, '{YOUR_SEED}')
meter.evaluate()
OverLayBenchMeter
covers the evaluation of mIoU
, Overlay mIoU(o-mIoU)
, Entity Success Rate (SR_E)
,
Relashionship Success Rate (SR_R)
, Relationship Success Rate (SR_R)
, Global CLIPScore
and Local CLIPScore
.
For FID
, please refer to the IQA-PyTorch package.
Comparison of generated images from different models on OverLayBench.
We deeply appreciate the contributions of the following projects: