Skip to content

Orlando-CS/Awesome-RL-in-Generative-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Awesome RL in Generative AI Awesome GitHub Stars

A curated list of resources on Flow Matching, GRPO (Generative Reinforcement Policy Optimization), and their applications in Generative AI (LLMs, diffusion models, preference alignment, visual generation, etc.).

Contributions welcome! 🚀


🔥 Latest Papers

  • MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE (2025)
    arXiv | HuggingFace
  • DanceGRPO: Unleashing GRPO on Visual Generation (2025)
    arXiv | Project Page
  • Flow-GRPO: Training Flow Matching Models via Online RL (2025)
    arXiv | HuggingFace
  • Online Reward-Weighted Fine-Tuning of Flow Matching (2025)
    arXiv
  • Energy-Weighted Flow Matching for Offline RL (ICLR 2025)
    OpenReview

📅 Timeline of Key Works

Year Paper / Project Highlights
2022 Flow Matching for Generative Modeling Introduces Flow Matching as an efficient ODE-based generative framework.
2023 Reinforcement Learning for Generative AI: A Survey Comprehensive RL+GenAI survey.
2024 Preference Alignment with Flow Matching (PFM) Preference-based flow alignment; GitHub Repo Stars.
2024 Flow-DPO Online DPO with multi-agent reasoning for LLMs.
2024 Flow Matching Guide and Code Practical guide + open-source implementations.
2025 (ICLR) Energy-Weighted Flow Matching (EFM) Offline RL with reward-as-energy formulation.
2025 Online Reward-Weighted Fine-Tuning (ORW-CFM-W2) Online fine-tuning with Wasserstein-2 regularization.
2025 Flow-GRPO ODE→SDE conversion; first online RL for Flow Matching.
2025 DanceGRPO Unified RL alignment framework across diffusion + flow + video.
2025 TempFlow-GRPO Explores timing effects in GRPO for flows.
2025 MixGRPO Mixed ODE-SDE sampling, improves efficiency by up to 71%.

🧭 Technical Prelude: When Reinforcement Learning Meets Flow Matching

  • Flow Matching (FM) Core Idea

    • ODE-based deterministic mapping from noise → data distribution.
    • Simulation-free training, efficient deterministic sampling.
    • Challenge: lacks stochastic exploration, essential for RL.
  • RL in Generative Alignment

    • Optimizes non-differentiable human-centric rewards (aesthetics, preferences).
    • Balances multiple objectives (alignment, quality, diversity).
    • Enables exploration of unseen high-reward regions.
    • Key methods: PPO, DPO, GRPO.

👉 Core challenge: Inject stochasticity into FM so RL can optimize it.


📖 Surveys & Tutorials


🔄 Flow Matching Foundations


🧩 Flow Matching + RL (Paradigm I: Online Policy Optimization)

Work Year Core Innovation Highlights Results
Flow-GRPO 📄 2025 ODE→SDE conversion, denoising step reduction First online RL with FM GenEval ↑ 63% → 95%
DanceGRPO 📄 🌐 2025 Unified multimodal RL framework Works on diffusion+flows, image+video Stable cross-domain scaling
MixGRPO 📄 2025 Mixed ODE-SDE sliding window RL on key steps only, ODE for others Efficiency ↑ 50–71%
TempFlow-GRPO 📄 2025 Timing-aware GRPO Studies reward-timing effects in FM Better RL stability

🎯 Preference Alignment (Paradigm II: Direct Preference Learning)

Work Year Type Core Idea Highlights
Preference Flow Matching (PFM) 📄 🛠️ Stars 2024 Preference-based Vector field learning y⁻→y⁺ Black-box friendly, avoids reward hacking
Flow-DPO 📄 2024 Online DPO + Multi-agent Dynamic preference pairs in reasoning Applied to LLM math
Energy-Weighted FM (EFM) 📄 2025 Offline RL Reward-as-energy formulation Leverages Q-function, no extra model
ORW-CFM-W2 📄 2025 Online fine-tuning Wasserstein-2 reg. Prevents mode collapse

🛠️ Code & Implementations


🙌 Contributing

Pull requests welcome! Please:

  1. Add links in the appropriate section.
  2. Format: [Title (Year)](link) — short description.
  3. Keep it concise & consistent.

📜 License

CC BY 4.0
This work is licensed under a Creative Commons Attribution 4.0 International License.

About

✨✨latest advancements of RL in generative ai

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published