Skip to content

Feat deep research #216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jul 26, 2025
Merged

Feat deep research #216

merged 16 commits into from
Jul 26, 2025

Conversation

codelion
Copy link
Owner

Also fixes #97

codelion added 16 commits July 24, 2025 18:17
Introduces `deep_research_plugin.py` and `web_search_plugin.py` to provide advanced research and web search capabilities. Updates `requirements.txt` to include selenium and webdriver-manager dependencies. Enhances plugin tests to cover the new plugins and updates expected plugin lists.
Introduces the Deep Research plugin based on the Test-Time Diffusion Deep Researcher (TTD-DR) algorithm, including core implementation, documentation, and OptILLM plugin interface. Adds new package files for query decomposition, iterative web search, synthesis, completeness evaluation, and structured report generation with citations. Updates .gitignore to exclude deep_research_reports/.
Changed all references from 'A Statistical Framework for Deep Researcher' to 'Deep Researcher with Test-Time Diffusion' and updated the associated arXiv URL in README, research_engine.py, and deep_research_plugin.py for accuracy and consistency.
Updated the research engine to perform web searches for each sub-query separately, preventing result truncation and improving coverage. The README was updated to document this change and provide guidance on search query processing.
Introduces a `clean_reasoning_tags` function to remove reasoning tags (e.g., <think>, <reflection>) from model responses for professional output. Updates DeepResearcher to apply this cleanup at key response stages and documents compatibility and cleanup behavior in the README.
Added the Test-Time Diffusion Deep Researcher (TTD-DR) algorithm to deep_research/research_engine.py, including draft generation, gap analysis, denoising, self-evolution, and finalization steps. Enhanced extract_search_queries in web_search_plugin.py to allow periods in queries, improving extraction for cases like 'Python 3.12'.
Added a robust cleanup function to remove all research placeholder tags from final reports. Improved gap analysis to prioritize placeholder tags and updated search logic to address high-priority gaps first. Increased default max_iterations and max_sources for more thorough research. Updated final report synthesis to ensure no placeholder tags remain.
Introduces BrowserSessionManager to enable reuse of a single browser session across multiple web searches, improving efficiency and reliability. DeepResearcher now uses a shared browser session for all search operations within a research run, and web_search_plugin's run function supports session reuse via the new manager.
Introduces session_state.py to manage browser sessions for concurrent deep research requests, ensuring thread safety and proper cleanup. Updates DeepResearcher to use unique session IDs and centralized session management, and improves search query extraction logic in web_search_plugin.py for more robust handling of search commands.
Extended timeout and retry logic for Gradio chat and deep research plugins to support long-running operations. Enhanced DeepResearcher prompts for more explicit gap analysis and research needs. Improved browser session recovery in web search plugin to handle invalidated sessions and prevent crashes. Updated default iteration and source limits for deep research to balance speed and coverage.
Introduces a set of sample research reports under optillm/plugins/deep_research/sample_reports. These reports cover topics such as TikTok bans, AI agent landscapes, unbanked market access, KKR's tech transactions, and more, providing detailed analyses and references for each subject.
@codelion codelion merged commit 0297900 into main Jul 26, 2025
3 checks passed
@codelion codelion deleted the feat-deep-research branch July 26, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a web search plugin that can be used to ground the responses
1 participant