Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

News

Recent Updates

[2025.05.06] Tool Environment Redesign: Completely redesigned and abstracted tool environments to support more flexible and diverse agent-tool interactions patterns.
[2025.05.06] Critical Bug Fixes: Fixed GRPO and Reinforce++ training crash issues that were causing NaN values during training. See issue #30 for details.
[2025.05.06] New Tutorials: Added comprehensive tutorials for creating custom tools and tool environments, including the first open-source runnable implementation of ReTool.

Earlier Updates

[2025.04.01] Added basic inference scripts and a simple interactive chat interface. You can now easily deploy and interact with your trained models. See inference guide for details.
[2025.03.18] Added comprehensive multi-modal support! Agent-R1 now seamlessly integrates with vision-language models (VLMs), enabling agents to process and reason with both text and visual inputs in rich multi-modal environments.
[2025.03.18] Refactored our codebase to improve maintainability! We've converted verl from a static folder to a git submodule and separated our custom code extensions. This makes it easier to update verl and understand the project structure.

Important: After pulling this update, you'll need to reinitialize your environment. Run git submodule update --init --recursive and reinstall verl locally from this directory.
[2025.03.16] Added support for process rewards! You can now assign rewards for each tool call based on its effectiveness. To balance process rewards with outcome rewards, we implemented reward normalization inspired by PRIME.

Overview

Agent-R1 is an open-source framework designed to accelerate research and development at the critical intersection of RL and Agent. Our framework employs End-to-End reinforcement learning to train agents in specific environments. Developers need only define domain-specific tools and reward functions to extend Agent-R1 to their unique use cases, eliminating the need for complex workflow engineering. We hope our modest contribution can benefit the open-source community, making it easier for researchers and developers to create and explore agents in their own domains, collectively advancing the development of autonomous agents. For more details on the algorithm, see algorithm doc.

Also check out Awesome-Agent-RL: Our curated collection of papers and resources on unlocking the potential of Agents through Reinforcement Learning.

Key Features

Multi-turn Tool Calling: End-to-end reinforcement learning on complete interaction trajectories, allowing agents to learn from sequences of actions
Multi-tool Coordination: Train agents to effectively coordinate and use multiple tools together to solve complex tasks
Process Rewards: Assign rewards for each tool call based on its effectiveness, balanced with outcome rewards through normalization
Custom Tools and Environments: Compatible with mainstream LLM tool calling formats, making it easy to extend with your own tools and scenarios
Multiple RL Algorithms: Supports diverse reinforcement learning approaches including PPO, GRPO, and REINFORCE++
Multi-modal Support: Compatible with vision-language models (VLMs) and multi-modal reinforcement learning

Upcoming Features

Expanded Model Support: Integration with more foundation models beyond the currently supported Qwen
Additional Use Cases: More example implementations across diverse scenarios and domains

Get Started

Environment Setup
Quick Start: Try Default Search Tool on HotpotQA (see also: Results on HotpotQA)

Extending Agent-R1 with Your Own Tools and Environments

Agent-R1 provides a flexible architecture for creating custom tools and tool environments to suit various agent applications. Our framework is built on two key abstractions:

BaseTool: Individual tools that agents can use to interact with external systems
BaseToolEnv: Tool environments that define the state transition function for agent-tool interactions

For detailed guidance on extending Agent-R1, refer to our tutorials:

Customizing Tools for Multi-hop QA: Learn how to create and customize tools for retrieving information across multiple knowledge sources
Customizing Tool Environment for ReTool: Understand how to implement tool environments that integrate code execution with LLM reasoning

Additional resources are available in the codebase:

Example tools: agent_r1/tool/tools/
Example environments: agent_r1/tool/envs/
Data preprocessing: examples/data_preprocess/
Reward functions: verl/utils/reward_score/

Feedback

We welcome all forms of feedback! Please raise an issue for bugs, questions, or suggestions. This helps our team address common problems efficiently and builds a more productive community.

Join our community: Connect with other users and our development team in our WeChat group or Discord server.

Contributors

Student Contributors: Jie Ouyang*, Ruiran Yan*, Yucong Luo*, Zirui Liu, Shuo Yu, Daoyu Wang, Yang Li

Supervisors: Qi Liu, Mingyue Cheng

Affiliation: State Key Laboratory of Cognitive Intelligence, USTC

Acknowledgements

We extend our gratitude to DeepSeek for providing the DeepSeek-R1 model and inspiring ideas. We are also thankful to the veRL team for their robust infrastructure support. Additionally, we acknowledge the RAGEN team for their groundbreaking discoveries, which significantly influenced our early exploration. Lastly, we deeply appreciate the insightful discussions and contributions from Jie Ouyang, Ruiran Yan, Yucong Luo, Zirui Liu, Shuo Yu and Daoyu Wang.

Citation

Agent-R1

@misc{Agent-R1,
  author       = {Jie Ouyang, Ruiran Yan, Yucong Luo, Mingyue Cheng, Qi Liu, Zirui Liu, Shuo Yu, Daoyu Wang},
  title        = {Training Powerful LLM Agents with End-to-End Reinforcement Learning},
  year         = {2025},
  organization = {GitHub},
  url          = {https://github.com/0russwest0/Agent-R1},
}

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
agent_r1		agent_r1
docs		docs
examples		examples
image		image
scripts		scripts
verl @ a43ead6		verl @ a43ead6
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

News

Overview

Key Features

Upcoming Features

Get Started

Extending Agent-R1 with Your Own Tools and Environments

Feedback

Contributors

Acknowledgements

Citation

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Languages

License

0russwest0/Agent-R1

Folders and files

Latest commit

History

Repository files navigation

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

News

Overview

Key Features

Upcoming Features

Get Started

Extending Agent-R1 with Your Own Tools and Environments

Feedback

Contributors

Acknowledgements

Citation

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Languages

Packages