-
Notifications
You must be signed in to change notification settings - Fork 342
Feat: Clip Higher #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Clip Higher #199
Conversation
|
Hello,could you explain the update parameters? |
The "Clip Higher" technique allows the probability ratio (new policy / old policy) to increase more for advantageous actions before being clipped(just like kl_beta=0.0). |
Hope my reply helps, during my experiment, these improvements had contributed to achieve higher performance. |
Thanks for your reply. We will merge it. Could you please further provide an example script in src/open-r1-multimodal/run_script folder? |
Sure. My bad. Just added two lines below the .sh files.
|
Added Clip Higher parameters according to the paper DAPO(https://arxiv.org/html/2503.14476v1)
I have also updated a new commit. Thank you. |
okay~ thanks for PR |
Feat: Clip Higher
Hello @SabaPivot How do I implement DAPO's dynamic sampling in this code? |
No description provided.