Skip to content

Feature/v0.2.1/odLength_reward #207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

P3ngLiu
Copy link
Contributor

@P3ngLiu P3ngLiu commented Apr 1, 2025

  1. Update the grpo_jsonl.py file to add the functionality for calculating mAP rewards, supporting length penalties and the selection of different scoring types.
  2. Fix the handling logic for ref_per_token_logps in grpo_trainer.py to ensure beta=0 works for KL setting.

@SZhanZ SZhanZ merged commit 8a0af96 into om-ai-lab:develop/v0.2.1 Apr 1, 2025
IANNXANG pushed a commit to IANNXANG/VLM-R1 that referenced this pull request May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants