Skip to content

qinghuannn/light-t2m

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arXiv

The official PyTorch implementation of the paper "Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation".

If you find this project or the paper useful in your research, please cite us:

@inproceedings{light-t2m,
  title={Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation},
  author={Zeng, Ling-An and Huang, Guohong and Wu, Gaojie and Zheng, Wei-Shi},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2025}
}

Getting Started

1. Create Conda Environment

We tested our code using Python 3.10.14, PyTorch 2.2.2, CUDA 12.1, and NVIDIA RTX 3090 GPUs.

conda create -n light-t2m python==3.10.14
conda activate light-t2m

# install pytorch
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121


# install requirements
pip install -r requirements.txt

# install mamba
cd mamba && pip install -e .

2. Download and preprocess the datasets

2.1 Download the Datasets

We conduct experiments on the HumanML3D and KIT-ML datasets. For both datasets, you can download them by following the instructions in HumanML3D.

Then, copy both datasets to our repository. For example, the file directory for HumanML3D should look like this:

./data/HumanML3D/
├── new_joint_vecs/
├── texts/
├── Mean.npy # same as in [HumanML3D](https://github.com/EricGuo5513/HumanML3D) 
├── Std.npy # same as in [HumanML3D](https://github.com/EricGuo5513/HumanML3D) 
├── train.txt
├── val.txt
├── test.txt
├── train_val.txt
└── all.txt

2.2 Preprocess the Datasets

To speed up data loading during training, we convert the datasets into .npy files using the following commands:

python src/tools/data_preprocess.py --dataset hml3d
python src/tools/data_preprocess.py --dataset kit

3. Download Dependencies and Pretrained Models

Download and unzip dependencies from here.

Download and unzip pretrained models from here.

Then, the file directory should look like this:

./
├── checkpoints
│   ├── hml3d.ckpt
│   ├── kit.ckpt
│   └── kit_new.ckpt
├── deps
│   ├── glove
│   └── t2m_guo
└── ...

Training

We train our Light-T2M model on two RTX 3090 GPUs.

  • HumanML3D
python src/train.py trainer.devices=\"0,1\" logger=wandb data=hml3d_light_final \
    data.batch_size=128 data.repeat_dataset=5 trainer.max_epochs=600 \
    callbacks/model_checkpoint=t2m +model/lr_scheduler=cosine model.guidance_scale=4\
    model.noise_scheduler.prediction_type=sample trainer.precision=bf16-mixed 
  • KIT-ML
python src/train.py trainer.devices=\"2,3\" logger=wandb data=kit_light_final \
    data.batch_size=128 data.repeat_dataset=5 trainer.max_epochs=1000 \
    callbacks/model_checkpoint=t2m +model/lr_scheduler=cosine model.guidance_scale=4\
    model.noise_scheduler.prediction_type=sample trainer.precision=bf16-mixed 

Evaluation

Set model.metrics.enable_mm_metric to True to evaluate Multimodality. Setting model.metrics.enable_mm_metric to False can speed up the evaluation.

  • HumanML3D
python src/eval.py trainer.devices=\"0,\" data=hml3d_light_final data.test_batch_size=128 \
    model=light_final  \
    model.guidance_scale=4 model.noise_scheduler.prediction_type=sample\
    model.denoiser.stage_dim=\"256\*4\" \
    ckpt_path=\"checkpoints/hml3d.ckpt\" model.metrics.enable_mm_metric=true
  • KIT-ML

We have observed that the performance of our trained model may fluctuate. Additionally, when we retrained the model on the KIT-ML dataset, we achieved improved performance with a new checkpoint (checkpoints/kit_new.ckpt).

python src/eval.py trainer.devices=\"1,\" data=kit_light_final data.test_batch_size=128 \
    model=light_final \
    model.guidance_scale=4 model.noise_scheduler.prediction_type=sample\
    model.denoiser.stage_dim=\"256\*4\" \
    ckpt_path=\"checkpoints/kit.ckpt\" model.metrics.enable_mm_metric=true
# or
python src/eval.py trainer.devices=\"1,\" data=kit_light_final data.test_batch_size=128 \
    model=light_final \
    model.guidance_scale=4 model.noise_scheduler.prediction_type=sample\
    model.denoiser.stage_dim=\"256\*4\" \
    ckpt_path=\"checkpoints/kit_new.ckpt\" model.metrics.enable_mm_metric=true

Evaluating Inference Time

One hundred samples randomly selected from the HumanML3D dataset are used to evaluate the inference time. The randomly selected samples are stored in ```data/random_selected_data.npy```.
CUDA_VISIBLE_DEVICES=0 python src/test_speed.py +trainer.benchmark=true model.noise_scheduler.prediction_type=sample 

Motion Generation

python src/sample_motion.py device=\"0\"  \
    model.guidance_scale=4 model.noise_scheduler.prediction_type=sample\
    text="A person walking and changing their path to the left." length=100

Visualization

1. Download Render Dependencies

Download and unzip rendering dependencies from here. Place the rendering dependencies in the ./visual_datas/ directory.

2. Install Python Dependencies

pip install imageio bpy matplotlib smplx h5py git+https://github.com/mattloper/chumpy imageio-ffmpeg

3. Visualize the Generated Motion

CUDA_VISIBLE_DEVICES=0 python -W ignore visualize/blend_render.py --file_dir ./visual_datas/gen_joints --mode video   --down_sample 1  --motion_list gen_motion_1 gen_motion_1

Citation

If you find this project or the paper useful in your research, please cite us:

@inproceedings{light-t2m,
  title={Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation},
  author={Zeng, Ling-An and Huang, Guohong and Wu, Gaojie and Zheng, Wei-Shi},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2025}
}

Acknowlegements

Thanks to all open-source projects and libraries that supported our research:

T2M, MLD, T2M-GPT, TEMOS, FLAME, MoMask, Mamba

License

This project is licensed under the MIT License.

Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

About

Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation (AAAI 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published