⭐ If HLFormer is helpful to your projects, please help star this repo. Thanks! 🤗
This repository contains the PyTorch implementation of our work at ICCV 2025.:
Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning Jun Li, Jinpeng Wang, Chaolei Tan, Niu Lian, Long Chen, Yaowei Wang, Min Zhang, Shu-Tao Xia, Bin Chen.
We propose HLFormer, the first hyperbolic modeling framework for PRVR, which leverages hyperbolic space learning to compensate for the suboptimal hierarchical modeling capabilities of Euclidean space. HLFormer's designs are faithfully tailored for two core demands in PRVR, namely (i) temporal modeling to extract key moment features, and (ii) learning robust cross-modal representations.
For (i), we inject the intra-video hierarchy prior into the temporal modeling by introducing multi-scale Lorentz attention.
It collaborates with the Euclidean attention and enhances activation of discriminative moment features relevant to queries.
For (ii), we introduce
Besides, we invite readers to refer to our previous work GMMFormer and GMMFormerV2.
In the following, we will guide you how to use this repository step by step. 🤗🐶
git clone https://github.com/lijun2005/ICCV25-HLFormer.git
cd ICCV25-HLFormer/
- python==3.11.8
- numpy==1.26.4
- pytorch==2.0.1
- torchvision==0.15.2
- scipy==1.5.4
- h5py==3.1.0
- addict==2.4.0
pip install -r requirements.txt
All features of TVR, ActivityNet Captions and Charades-STA are kindly provided by the authors of ms-sl.
!!! Please note that we did not use any features derived from ViT.
The data can be downloaded from Baidu pan or Google drive.
The dataset directory is organized as follows:
PRVR_data/
└── PRVR/
├── activitynet/
│ ├── FeatureData/
│ ├── TextData/
│ ├── val_1.json
│ └── val_2.json
├── charades/
│ ├── FeatureData/
│ └── TextData/
└── tvr/
├── FeatureData/
└── TextData/
Finally, set root and data_root in config files (e.g., ./src/Configs/tvr.py cfg['root']
and cfg['data_root']
).
To train HLFormer on TVR:
cd src
python main.py -d tvr --gpu 0
To train HLFormer on ActivityNet Captions:
cd src
python main.py -d act --gpu 0
To train HLFormer on Charades-STA:
cd src
python main.py -d cha --gpu 0
For this repository, the expected performance is:
Dataset | R@1 | R@5 | R@10 | R@100 | SumR | ckpt and logs |
---|---|---|---|---|---|---|
TVR | 15.7 | 37.1 | 48.5 | 86.4 | 187.7 | Google drive |
ActivityNet Captions | 8.7 | 27.1 | 40.1 | 79.0 | 154.9 | Google drive |
Charades-STA | 2.6 | 8.5 | 13.7 | 54.0 | 78.7 | Google drive |
If you find our code useful or use the toolkit in your work, please consider citing:
@inproceedings{Li25_HLFormer,
author = {Li, Jun and Wang, Jinpeng and Tan, Chaolei and Lian,Niu and Chen,Long and Wang, Yaowei and Zhang,Min and Xia, Shu-Tao and Chen, Bin},
title={Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025}
}
This code is based on our previous work GMMFormer and GMMFormerV2. We are also grateful for other teams for open-sourcing codes that inspire our work, including ms-sl, dl-dkd, meru.
If you have any question, you can raise an issue or email Jun Li ([email protected]) and Jinpeng Wang ([email protected]).