Awesome RL in Generative AI

A curated list of resources on Flow Matching, GRPO (Generative Reinforcement Policy Optimization), and their applications in Generative AI (LLMs, diffusion models, preference alignment, visual generation, etc.).

Contributions welcome! 🚀

🔥 Latest Papers

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE (2025)
arXiv | HuggingFace
DanceGRPO: Unleashing GRPO on Visual Generation (2025)
arXiv | Project Page
Flow-GRPO: Training Flow Matching Models via Online RL (2025)
arXiv | HuggingFace
Online Reward-Weighted Fine-Tuning of Flow Matching (2025)
arXiv
Energy-Weighted Flow Matching for Offline RL (ICLR 2025)
OpenReview

📅 Timeline of Key Works

Year	Paper / Project	Highlights
2022	Flow Matching for Generative Modeling	Introduces Flow Matching as an efficient ODE-based generative framework.
2023	Reinforcement Learning for Generative AI: A Survey	Comprehensive RL+GenAI survey.
2024	Preference Alignment with Flow Matching (PFM)	Preference-based flow alignment; GitHub Repo .
2024	Flow-DPO	Online DPO with multi-agent reasoning for LLMs.
2024	Flow Matching Guide and Code	Practical guide + open-source implementations.
2025 (ICLR)	Energy-Weighted Flow Matching (EFM)	Offline RL with reward-as-energy formulation.
2025	Online Reward-Weighted Fine-Tuning (ORW-CFM-W2)	Online fine-tuning with Wasserstein-2 regularization.
2025	Flow-GRPO	ODE→SDE conversion; first online RL for Flow Matching.
2025	DanceGRPO	Unified RL alignment framework across diffusion + flow + video.
2025	TempFlow-GRPO	Explores timing effects in GRPO for flows.
2025	MixGRPO	Mixed ODE-SDE sampling, improves efficiency by up to 71%.

🧭 Technical Prelude: When Reinforcement Learning Meets Flow Matching

Flow Matching (FM) Core Idea
- ODE-based deterministic mapping from noise → data distribution.
- Simulation-free training, efficient deterministic sampling.
- Challenge: lacks stochastic exploration, essential for RL.
RL in Generative Alignment
- Optimizes non-differentiable human-centric rewards (aesthetics, preferences).
- Balances multiple objectives (alignment, quality, diversity).
- Enables exploration of unseen high-reward regions.
- Key methods: PPO, DPO, GRPO.

👉 Core challenge: Inject stochasticity into FM so RL can optimize it.

📖 Surveys & Tutorials

🔄 Flow Matching Foundations

🧩 Flow Matching + RL (Paradigm I: Online Policy Optimization)

Work	Year	Core Innovation	Highlights	Results
Flow-GRPO 📄	2025	ODE→SDE conversion, denoising step reduction	First online RL with FM	GenEval ↑ 63% → 95%
DanceGRPO 📄 🌐	2025	Unified multimodal RL framework	Works on diffusion+flows, image+video	Stable cross-domain scaling
MixGRPO 📄	2025	Mixed ODE-SDE sliding window	RL on key steps only, ODE for others	Efficiency ↑ 50–71%
TempFlow-GRPO 📄	2025	Timing-aware GRPO	Studies reward-timing effects in FM	Better RL stability

🎯 Preference Alignment (Paradigm II: Direct Preference Learning)

Work	Year	Type	Core Idea	Highlights
Preference Flow Matching (PFM) 📄 🛠️	2024	Preference-based	Vector field learning y⁻→y⁺	Black-box friendly, avoids reward hacking
Flow-DPO 📄	2024	Online DPO + Multi-agent	Dynamic preference pairs in reasoning	Applied to LLM math
Energy-Weighted FM (EFM) 📄	2025	Offline RL	Reward-as-energy formulation	Leverages Q-function, no extra model
ORW-CFM-W2 📄	2025	Online fine-tuning	Wasserstein-2 reg.	Prevents mode collapse

🛠️ Code & Implementations

jadehaus/preference-flow-matching — PFM PyTorch implementation.
DanceGRPO Project — Official demo & project page.
Flow Matching Blog Implementations — Educational code snippets.

🙌 Contributing

Pull requests welcome! Please:

Add links in the appropriate section.
Format: [Title (Year)](link) — short description.
Keep it concise & consistent.

📜 License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome RL in Generative AI

🔥 Latest Papers

📅 Timeline of Key Works

🧭 Technical Prelude: When Reinforcement Learning Meets Flow Matching

📖 Surveys & Tutorials

🔄 Flow Matching Foundations

🧩 Flow Matching + RL (Paradigm I: Online Policy Optimization)

🎯 Preference Alignment (Paradigm II: Direct Preference Learning)

🛠️ Code & Implementations

🙌 Contributing

📜 License

About

Uh oh!

Releases

Packages

Orlando-CS/Awesome-RL-in-Generative-AI

Folders and files

Latest commit

History

Repository files navigation

Awesome RL in Generative AI

🔥 Latest Papers

📅 Timeline of Key Works

🧭 Technical Prelude: When Reinforcement Learning Meets Flow Matching

📖 Surveys & Tutorials

🔄 Flow Matching Foundations

🧩 Flow Matching + RL (Paradigm I: Online Policy Optimization)

🎯 Preference Alignment (Paradigm II: Direct Preference Learning)

🛠️ Code & Implementations

🙌 Contributing

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages