HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

⭐ If HLFormer is helpful to your projects, please help star this repo. Thanks! 🤗

1. Introduction

This repository contains the PyTorch implementation of our work at ICCV 2025.:

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning Jun Li, Jinpeng Wang, Chaolei Tan, Niu Lian, Long Chen, Yaowei Wang, Min Zhang, Shu-Tao Xia, Bin Chen.

We propose HLFormer, the first hyperbolic modeling framework for PRVR, which leverages hyperbolic space learning to compensate for the suboptimal hierarchical modeling capabilities of Euclidean space. HLFormer's designs are faithfully tailored for two core demands in PRVR, namely (i) temporal modeling to extract key moment features, and (ii) learning robust cross-modal representations. For (i), we inject the intra-video hierarchy prior into the temporal modeling by introducing multi-scale Lorentz attention. It collaborates with the Euclidean attention and enhances activation of discriminative moment features relevant to queries. For (ii), we introduce $L_{pop}$ to impose a fine-grained 'text < video' semantic entailment constraint in hyperbolic space. This helps to model the inter-video hierarchy prior among videos and texts.

Besides, we invite readers to refer to our previous work GMMFormer and GMMFormerV2.

In the following, we will guide you how to use this repository step by step. 🤗🐶

2. Preparation

git clone https://github.com/lijun2005/ICCV25-HLFormer.git
cd ICCV25-HLFormer/

2.1 Requirements

python==3.11.8
numpy==1.26.4
pytorch==2.0.1
torchvision==0.15.2
scipy==1.5.4
h5py==3.1.0
addict==2.4.0
pip install -r requirements.txt

2.2 Download the feature datasets and organize them properly

All features of TVR, ActivityNet Captions and Charades-STA are kindly provided by the authors of ms-sl.

!!! Please note that we did not use any features derived from ViT.

The data can be downloaded from Baidu pan or Google drive.

The dataset directory is organized as follows:

PRVR_data/
└── PRVR/
    ├── activitynet/
    │   ├── FeatureData/
    │   ├── TextData/
    │   ├── val_1.json
    │   └── val_2.json
    ├── charades/
    │   ├── FeatureData/
    │   └── TextData/
    └── tvr/
        ├── FeatureData/
        └── TextData/

Finally, set root and data_root in config files (e.g., ./src/Configs/tvr.py cfg['root'] and cfg['data_root']).

3. Run

3.1 Train

To train HLFormer on TVR:

cd src
python main.py -d tvr --gpu 0

To train HLFormer on ActivityNet Captions:

cd src
python main.py -d act --gpu 0

To train HLFormer on Charades-STA:

cd src
python main.py -d cha --gpu 0

3.2 Retrieval Performance

For this repository, the expected performance is:

Dataset	R@1	R@5	R@10	R@100	SumR	ckpt and logs
TVR	15.7	37.1	48.5	86.4	187.7	Google drive
ActivityNet Captions	8.7	27.1	40.1	79.0	154.9	Google drive
Charades-STA	2.6	8.5	13.7	54.0	78.7	Google drive

4. References

If you find our code useful or use the toolkit in your work, please consider citing:

@inproceedings{Li25_HLFormer,
  author = {Li, Jun and  Wang, Jinpeng and Tan, Chaolei and Lian,Niu and Chen,Long and Wang, Yaowei and Zhang,Min and Xia, Shu-Tao and Chen, Bin},
  title={Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

5. Acknowledgements

This code is based on our previous work GMMFormer and GMMFormerV2. We are also grateful for other teams for open-sourcing codes that inspire our work, including ms-sl, dl-dkd, meru.

6. Contact

If you have any question, you can raise an issue or email Jun Li ([email protected]) and Jinpeng Wang ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
figures		figures
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

TABLE OF CONTENTS

1. Introduction

2. Preparation

2.1 Requirements

2.2 Download the feature datasets and organize them properly

3. Run

3.1 Train

3.2 Retrieval Performance

4. References

5. Acknowledgements

6. Contact

About

Uh oh!

Releases

Packages

Languages

License

gimpong/ICCV25-HLFormer

Folders and files

Latest commit

History

Repository files navigation

HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

TABLE OF CONTENTS

1. Introduction

2. Preparation

2.1 Requirements

2.2 Download the feature datasets and organize them properly

3. Run

3.1 Train

3.2 Retrieval Performance

4. References

5. Acknowledgements

6. Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages