An end-to-end pipeline for adapting FLAN-T5 for dialogue summarization, exploring the full spectrum of modern LLM tuning. Implements and compares Full Fine-Tuning, PEFT (LoRA), and Reinforcement Learning (RLHF) for performance and alignment. Features a PPO-tuned model to reduce toxicity, in-depth analysis notebooks, and interactive Streamlit demo.
-
Updated
Aug 4, 2025 - Jupyter Notebook