A curated list of resources on Flow Matching, GRPO (Generative Reinforcement Policy Optimization), and their applications in Generative AI (LLMs, diffusion models, preference alignment, visual generation, etc.).
Contributions welcome! 🚀
- MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE (2025)
arXiv | HuggingFace - DanceGRPO: Unleashing GRPO on Visual Generation (2025)
arXiv | Project Page - Flow-GRPO: Training Flow Matching Models via Online RL (2025)
arXiv | HuggingFace - Online Reward-Weighted Fine-Tuning of Flow Matching (2025)
arXiv - Energy-Weighted Flow Matching for Offline RL (ICLR 2025)
OpenReview
| Year | Paper / Project | Highlights |
|---|---|---|
| 2022 | Flow Matching for Generative Modeling | Introduces Flow Matching as an efficient ODE-based generative framework. |
| 2023 | Reinforcement Learning for Generative AI: A Survey | Comprehensive RL+GenAI survey. |
| 2024 | Preference Alignment with Flow Matching (PFM) | Preference-based flow alignment; GitHub Repo |
| 2024 | Flow-DPO | Online DPO with multi-agent reasoning for LLMs. |
| 2024 | Flow Matching Guide and Code | Practical guide + open-source implementations. |
| 2025 (ICLR) | Energy-Weighted Flow Matching (EFM) | Offline RL with reward-as-energy formulation. |
| 2025 | Online Reward-Weighted Fine-Tuning (ORW-CFM-W2) | Online fine-tuning with Wasserstein-2 regularization. |
| 2025 | Flow-GRPO | ODE→SDE conversion; first online RL for Flow Matching. |
| 2025 | DanceGRPO | Unified RL alignment framework across diffusion + flow + video. |
| 2025 | TempFlow-GRPO | Explores timing effects in GRPO for flows. |
| 2025 | MixGRPO | Mixed ODE-SDE sampling, improves efficiency by up to 71%. |
-
Flow Matching (FM) Core Idea
- ODE-based deterministic mapping from noise → data distribution.
- Simulation-free training, efficient deterministic sampling.
- Challenge: lacks stochastic exploration, essential for RL.
-
RL in Generative Alignment
- Optimizes non-differentiable human-centric rewards (aesthetics, preferences).
- Balances multiple objectives (alignment, quality, diversity).
- Enables exploration of unseen high-reward regions.
- Key methods: PPO, DPO, GRPO.
👉 Core challenge: Inject stochasticity into FM so RL can optimize it.
- Reinforcement Learning for Generative AI: A Survey (2023)
- LLM Optimization: GRPO, PPO, and DPO (Analytics Vidhya, 2025)
- Preference Tuning LLMs: PPO, DPO, GRPO — A Simple Guide
- What is GRPO and Flow-GRPO? (Turing Post)
- Flow Matching for Generative Modeling (2022) | PDF
- Flow Matching Guide and Code (2024)
- Cambridge ML Blog: Introduction to Flow Matching (2024)
| Work | Year | Core Innovation | Highlights | Results |
|---|---|---|---|---|
| Flow-GRPO 📄 | 2025 | ODE→SDE conversion, denoising step reduction | First online RL with FM | GenEval ↑ 63% → 95% |
| DanceGRPO 📄 🌐 | 2025 | Unified multimodal RL framework | Works on diffusion+flows, image+video | Stable cross-domain scaling |
| MixGRPO 📄 | 2025 | Mixed ODE-SDE sliding window | RL on key steps only, ODE for others | Efficiency ↑ 50–71% |
| TempFlow-GRPO 📄 | 2025 | Timing-aware GRPO | Studies reward-timing effects in FM | Better RL stability |
| Work | Year | Type | Core Idea | Highlights |
|---|---|---|---|---|
| Preference Flow Matching (PFM) 📄 🛠️ |
2024 | Preference-based | Vector field learning y⁻→y⁺ | Black-box friendly, avoids reward hacking |
| Flow-DPO 📄 | 2024 | Online DPO + Multi-agent | Dynamic preference pairs in reasoning | Applied to LLM math |
| Energy-Weighted FM (EFM) 📄 | 2025 | Offline RL | Reward-as-energy formulation | Leverages Q-function, no extra model |
| ORW-CFM-W2 📄 | 2025 | Online fine-tuning | Wasserstein-2 reg. | Prevents mode collapse |
- jadehaus/preference-flow-matching
— PFM PyTorch implementation.
- DanceGRPO Project — Official demo & project page.
- Flow Matching Blog Implementations — Educational code snippets.
Pull requests welcome! Please:
- Add links in the appropriate section.
- Format:
[Title (Year)](link) — short description. - Keep it concise & consistent.
This work is licensed under a Creative Commons Attribution 4.0 International License.