Skip to content

[Feature] Add confluence connector #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 28, 2025

Conversation

CREDO23
Copy link
Contributor

@CREDO23 CREDO23 commented Jul 26, 2025

Description

  • Add confluence connector

Screenshots

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance improvement (non-breaking change which enhances performance)
  • Documentation update
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project
  • My change requires documentation updates
  • I have updated the documentation accordingly
  • My change requires dependency updates
  • I have updated the dependencies accordingly
  • My code builds clean without any errors or warnings
  • All new and existing tests passed

Summary by CodeRabbit

  • New Features

    • Introduced Confluence as a new connector, enabling users to add, configure, and index Confluence pages and comments.
    • Added support for searching and retrieving Confluence content within the application.
    • Provided dedicated interfaces for adding and editing Confluence connector credentials and settings.
    • Included documentation and setup instructions for integrating Confluence.
  • Improvements

    • Added Confluence-specific icons and visual cues for documents and connectors.
    • Enhanced validation to enforce required fields in Confluence connector configurations.
  • Chores

    • Updated schemas and form defaults to support new Confluence (and Jira) connector fields.
    • Extended connector editing logic to handle Confluence and Jira credential fields with validation and state management.

Copy link

vercel bot commented Jul 26, 2025

@CREDO23 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

coderabbitai bot commented Jul 26, 2025

Walkthrough

This change introduces comprehensive support for a new "Confluence" connector across the backend and frontend. It adds the connector to enums, database migrations, validation, and configuration schemas; implements Confluence API integration and indexing logic; updates relevant API routes and services; and extends the UI with forms and icons for managing and searching Confluence-based knowledge sources.

Changes

File(s) Change Summary
.../alembic/versions/14_add_confluence_connector_enums.py Alembic migration: Adds CONFLUENCE_CONNECTOR to two PostgreSQL enums.
.../app/db.py Adds CONFLUENCE_CONNECTOR to DocumentType and SearchSourceConnectorType enums.
.../app/agents/researcher/nodes.py Adds Confluence branch to fetch_relevant_documents for searching Confluence data.
.../app/agents/researcher/qna_agent/prompts.py Adds "CONFLUENCE_CONNECTOR" to Q&A citation prompt sources.
.../app/connectors/confluence_connector.py New module: Implements ConfluenceConnector class for Confluence Cloud API integration.
.../app/routes/search_source_connectors_routes.py Adds Confluence indexing support and helper functions for background tasks and error handling.
.../app/schemas/search_source_connector.py Adds config validation logic for CONFLUENCE_CONNECTOR.
.../app/services/connector_service.py Adds search_confluence async method to search Confluence pages/comments.
.../app/tasks/connectors_indexing_tasks.py Adds async index_confluence_pages for indexing Confluence data into the system.
.../dashboard/[search_space_id]/connectors/add/confluence-connector/page.tsx New page: UI for adding a Confluence connector with form, docs, and validation.
.../dashboard/[search_space_id]/connectors/add/page.tsx Adds Confluence connector entry to available connectors list.
.../dashboard/[search_space_id]/connectors/[connector_id]/edit/page.tsx Adds Confluence-specific fields to connector edit form.
.../dashboard/[search_space_id]/documents/(manage)/page.tsx Adds icon mapping for CONFLUENCE_CONNECTOR document type.
.../components/editConnector/types.ts Adds Confluence/Jira config fields to edit connector schema.
.../hooks/useConnectorEditPage.ts Adds default values for Confluence/Jira fields in edit form state.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant Backend
    participant ConfluenceAPI
    participant Database

    User->>Frontend: Add Confluence connector (form)
    Frontend->>Backend: POST connector config (with credentials)
    Backend->>Database: Store connector config
    User->>Frontend: Trigger indexing
    Frontend->>Backend: POST index request
    Backend->>ConfluenceAPI: Fetch spaces/pages/comments
    ConfluenceAPI-->>Backend: Return Confluence data
    Backend->>Database: Store indexed documents
    Backend-->>Frontend: Indexing result/status
    Frontend-->>User: Show status / results

    User->>Frontend: Search query
    Frontend->>Backend: Search request (with Confluence connector)
    Backend->>Database: Retrieve indexed Confluence docs
    Backend-->>Frontend: Search results
    Frontend-->>User: Display Confluence search results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related issues

  • Add a Confluence Connector #228: Both concern Confluence connector integration; this PR provides the full code implementation addressing the feature request in the issue.

Possibly related PRs

  • Feat/Discord Connector #125: Adds a new connector type (Discord) with similar enum, indexing, search, and UI patterns, closely related in architecture and scope.

Suggested reviewers

  • MODSetter

Poem

In burrows deep, I code with glee,
Confluence pages now join our spree!
Forms and icons, enums too,
Fetching docs both old and new.
With every hop, a search made bright—
More knowledge gathered, day and night.
🐇📚✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

recurseml bot commented Jul 26, 2025

✨ No issues found! Your code is sparkling clean! ✨

⚠️ Only 5 files were analyzed due to processing limits.

Need help? Join our Discord for support!
https://discord.gg/qEjHQk64Z9

@CREDO23 CREDO23 marked this pull request as ready for review July 27, 2025 11:20
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (5)
surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx (1)

136-144: Add CONFLUENCE_CONNECTOR to the Document type definition.

The Document type definition is missing CONFLUENCE_CONNECTOR in the union type for document_type, but it's included in the documentTypeIcons mapping. This type inconsistency could cause TypeScript errors.

Apply this diff to fix the type definition:

  document_type:
    | "EXTENSION"
    | "CRAWLED_URL"
    | "SLACK_CONNECTOR"
    | "NOTION_CONNECTOR"
    | "FILE"
    | "YOUTUBE_VIDEO"
    | "LINEAR_CONNECTOR"
-    | "DISCORD_CONNECTOR";
+    | "DISCORD_CONNECTOR"
+    | "JIRA_CONNECTOR"
+    | "GITHUB_CONNECTOR"
+    | "CONFLUENCE_CONNECTOR";
surfsense_web/components/editConnector/types.ts (1)

1-1: Fix Prettier formatting issues.

The pipeline indicates formatting issues that need to be resolved. Please run prettier --write on this file to fix the code style.

prettier --write surfsense_web/components/editConnector/types.ts
surfsense_web/hooks/useConnectorEditPage.ts (2)

126-188: Missing save logic for Confluence and Jira connectors.

The switch statement handles various connector types but is missing cases for CONFLUENCE_CONNECTOR and JIRA_CONNECTOR. This means the form can collect the configuration data but cannot save it.

Add the missing cases to the switch statement:

            case 'DISCORD_CONNECTOR':
                if (formData.DISCORD_BOT_TOKEN !== originalConfig.DISCORD_BOT_TOKEN) {
                    if (!formData.DISCORD_BOT_TOKEN) { toast.error("Discord Bot Token cannot be empty."); setIsSaving(false); return; }
                    newConfig = { DISCORD_BOT_TOKEN: formData.DISCORD_BOT_TOKEN };
                }
                break;
+           case 'CONFLUENCE_CONNECTOR':
+               const confluenceChanged = 
+                   formData.CONFLUENCE_BASE_URL !== originalConfig.CONFLUENCE_BASE_URL ||
+                   formData.CONFLUENCE_EMAIL !== originalConfig.CONFLUENCE_EMAIL ||
+                   formData.CONFLUENCE_API_TOKEN !== originalConfig.CONFLUENCE_API_TOKEN;
+               if (confluenceChanged) {
+                   if (!formData.CONFLUENCE_BASE_URL || !formData.CONFLUENCE_EMAIL || !formData.CONFLUENCE_API_TOKEN) {
+                       toast.error("All Confluence fields are required."); setIsSaving(false); return;
+                   }
+                   newConfig = {
+                       CONFLUENCE_BASE_URL: formData.CONFLUENCE_BASE_URL,
+                       CONFLUENCE_EMAIL: formData.CONFLUENCE_EMAIL,
+                       CONFLUENCE_API_TOKEN: formData.CONFLUENCE_API_TOKEN
+                   };
+               }
+               break;
+           case 'JIRA_CONNECTOR':
+               const jiraChanged = 
+                   formData.JIRA_BASE_URL !== originalConfig.JIRA_BASE_URL ||
+                   formData.JIRA_EMAIL !== originalConfig.JIRA_EMAIL ||
+                   formData.JIRA_API_TOKEN !== originalConfig.JIRA_API_TOKEN;
+               if (jiraChanged) {
+                   if (!formData.JIRA_BASE_URL || !formData.JIRA_EMAIL || !formData.JIRA_API_TOKEN) {
+                       toast.error("All Jira fields are required."); setIsSaving(false); return;
+                   }
+                   newConfig = {
+                       JIRA_BASE_URL: formData.JIRA_BASE_URL,
+                       JIRA_EMAIL: formData.JIRA_EMAIL,
+                       JIRA_API_TOKEN: formData.JIRA_API_TOKEN
+                   };
+               }
+               break;

217-231: Missing form value updates for Confluence and Jira connectors.

The form value update logic after successful saves is missing cases for CONFLUENCE_CONNECTOR and JIRA_CONNECTOR.

Add the missing form value updates:

                 } else if(connector.connector_type === 'DISCORD_CONNECTOR') {
                    editForm.setValue('DISCORD_BOT_TOKEN', newlySavedConfig.DISCORD_BOT_TOKEN || "");
+                } else if(connector.connector_type === 'CONFLUENCE_CONNECTOR') {
+                   editForm.setValue('CONFLUENCE_BASE_URL', newlySavedConfig.CONFLUENCE_BASE_URL || "");
+                   editForm.setValue('CONFLUENCE_EMAIL', newlySavedConfig.CONFLUENCE_EMAIL || "");
+                   editForm.setValue('CONFLUENCE_API_TOKEN', newlySavedConfig.CONFLUENCE_API_TOKEN || "");
+                } else if(connector.connector_type === 'JIRA_CONNECTOR') {
+                   editForm.setValue('JIRA_BASE_URL', newlySavedConfig.JIRA_BASE_URL || "");
+                   editForm.setValue('JIRA_EMAIL', newlySavedConfig.JIRA_EMAIL || "");
+                   editForm.setValue('JIRA_API_TOKEN', newlySavedConfig.JIRA_API_TOKEN || "");
                 }
surfsense_backend/app/connectors/confluence_connector.py (1)

296-356: Implement date filtering in get_pages_by_date_range

The method currently ignores start_date and end_date, fetching all pages and defeating its purpose. To fix this:

  • File: surfsense_backend/app/connectors/confluence_connector.py
  • Method: get_pages_by_date_range (lines 296–356)

Action items:

  • Replace the "pages" endpoint call with the /content/search endpoint.
  • Build a CQL string using created (or lastmodified) filters:
    cql_parts = [f'type=page', f'created >= "{start_date}"', f'created <= "{end_date}"']
    if space_ids:
        space_list = ",".join(f'"{s}"' for s in space_ids)
        cql_parts.append(f"space in ({space_list})")
    cql = " AND ".join(cql_parts)
  • Pass cql, start, and limit as query parameters and loop with pagination:
    - params = {"limit": 100, "body-format": "storage"}
    - # … old pages‐by‐cursor logic …
    + search_params = {"cql": cql, "limit": 50, "start": 0, "expand": "body.storage"}
    + all_pages = []
    + while True:
    +     result = self.make_api_request("content/search", search_params)
    +     pages = result.get("results", [])
    +     all_pages.extend(pages)
    +     if pages and len(pages) == search_params["limit"]:
    +         search_params["start"] += search_params["limit"]
    +     else:
    +         break
  • Update the method docstring to reflect the new behavior.

This ensures you only fetch pages within the requested date range (and optional spaces), avoiding unnecessary data and improving performance.

🧹 Nitpick comments (2)
surfsense_backend/app/tasks/connectors_indexing_tasks.py (1)

2480-2508: Consider consistent error handling for API failures

The error handling for "No pages found" updates last_indexed_at and returns success, but other API errors don't follow the same pattern. This could lead to inconsistent behavior when the API is temporarily unavailable vs. when there's genuinely no data.

Consider treating transient API errors similarly to ensure the connector doesn't get stuck if there's a temporary API issue.

surfsense_backend/app/connectors/confluence_connector.py (1)

163-166: Improve cursor extraction robustness

Extracting cursor from URL string using string manipulation is fragile and could break if the URL format changes.

Consider using URL parsing:

-            if "cursor=" in next_link:
-                cursor = next_link.split("cursor=")[1].split("&")[0]
+            from urllib.parse import urlparse, parse_qs
+            parsed_url = urlparse(next_link)
+            query_params = parse_qs(parsed_url.query)
+            cursor = query_params.get('cursor', [None])[0]
+            if not cursor:
+                break
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 237e11f and dee54bf.

⛔ Files ignored due to path filters (2)
  • node_modules/.cache/prettier/.prettier-caches/37bd945444dc76999f7aface662ff267baf1dbca.json is excluded by !**/node_modules/**, !**/.cache/**
  • node_modules/.cache/prettier/.prettier-caches/a2ecb2962bf19c1099cfe708e42daa0097f94976.json is excluded by !**/node_modules/**, !**/.cache/**
📒 Files selected for processing (15)
  • surfsense_backend/alembic/versions/14_add_confluence_connector_enums.py (1 hunks)
  • surfsense_backend/app/agents/researcher/nodes.py (1 hunks)
  • surfsense_backend/app/agents/researcher/qna_agent/prompts.py (1 hunks)
  • surfsense_backend/app/connectors/confluence_connector.py (1 hunks)
  • surfsense_backend/app/db.py (2 hunks)
  • surfsense_backend/app/routes/search_source_connectors_routes.py (3 hunks)
  • surfsense_backend/app/schemas/search_source_connector.py (1 hunks)
  • surfsense_backend/app/services/connector_service.py (1 hunks)
  • surfsense_backend/app/tasks/connectors_indexing_tasks.py (2 hunks)
  • surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/edit/page.tsx (1 hunks)
  • surfsense_web/app/dashboard/[search_space_id]/connectors/add/confluence-connector/page.tsx (1 hunks)
  • surfsense_web/app/dashboard/[search_space_id]/connectors/add/page.tsx (2 hunks)
  • surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx (2 hunks)
  • surfsense_web/components/editConnector/types.ts (1 hunks)
  • surfsense_web/hooks/useConnectorEditPage.ts (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
surfsense_backend/app/schemas/search_source_connector.py (1)
surfsense_backend/app/db.py (1)
  • SearchSourceConnectorType (49-59)
surfsense_backend/app/routes/search_source_connectors_routes.py (2)
surfsense_backend/app/tasks/connectors_indexing_tasks.py (1)
  • index_confluence_pages (2335-2700)
surfsense_backend/app/db.py (1)
  • SearchSourceConnectorType (49-59)
surfsense_backend/app/agents/researcher/nodes.py (2)
surfsense_backend/app/services/connector_service.py (1)
  • search_confluence (1075-1170)
surfsense_backend/app/services/streaming_service.py (1)
  • format_terminal_info_delta (28-47)
🪛 GitHub Actions: pre-commit
surfsense_web/components/editConnector/types.ts

[error] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.

surfsense_web/hooks/useConnectorEditPage.ts

[error] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.

surfsense_web/app/dashboard/[search_space_id]/connectors/add/confluence-connector/page.tsx

[error] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Python Backend Quality
🔇 Additional comments (20)
surfsense_backend/app/agents/researcher/qna_agent/prompts.py (1)

19-19: LGTM! Confluence connector integration is consistent.

The addition of CONFLUENCE_CONNECTOR knowledge source follows the established pattern and aligns with the broader system integration of the Confluence connector feature.

surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx (2)

60-60: Good choice of icon for Confluence connector.

The IconBook import is semantically appropriate for documentation and knowledge base content from Confluence.


184-184: LGTM! Icon mapping is consistent.

The CONFLUENCE_CONNECTOR: IconBook mapping is correctly added and follows the established pattern.

surfsense_backend/app/db.py (2)

46-46: LGTM! Enum addition follows established pattern.

The CONFLUENCE_CONNECTOR addition to DocumentType enum is consistent with other connector types and necessary for the Confluence integration.


59-59: LGTM! SearchSourceConnectorType enum extension is consistent.

The CONFLUENCE_CONNECTOR addition follows the established pattern and aligns with the broader system integration.

surfsense_web/components/editConnector/types.ts (1)

35-40: LGTM! Schema extensions are well-structured.

The new fields for Confluence and Jira credentials follow the established naming convention and are appropriately typed as optional strings.

surfsense_web/app/dashboard/[search_space_id]/connectors/add/page.tsx (2)

16-16: LGTM! Icon import is consistent.

The IconBook import aligns with its usage for the Confluence connector and matches the icon choice in other parts of the system.


145-152: Excellent connector entry structure.

The Confluence connector entry is well-structured and follows the established pattern:

  • Appropriately categorized under "Knowledge Bases"
  • Clear, descriptive title and description
  • Consistent icon usage with IconBook
  • Proper status set to "available"
  • ID follows the naming convention

The description effectively communicates the connector's purpose for searching Confluence pages, comments, and documentation.

surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/edit/page.tsx (1)

211-236: LGTM! Consistent implementation following established patterns.

The Confluence connector form fields are implemented correctly, following the same three-field pattern used by the Jira connector (base URL, email, API token). The field names, labels, descriptions, and placeholders are appropriate for Confluence integration.

surfsense_backend/app/schemas/search_source_connector.py (1)

146-164: LGTM! Proper validation following established patterns.

The Confluence connector validation correctly enforces the three required configuration keys and ensures they're non-empty. This matches the validation pattern used for the Jira connector, maintaining consistency across similar connector types.

surfsense_backend/alembic/versions/14_add_confluence_connector_enums.py (1)

18-53: LGTM! Well-structured migration with proper safeguards.

The migration correctly adds the CONFLUENCE_CONNECTOR enum value to both required enum types with proper conditional logic to prevent duplicate additions. The empty downgrade function is appropriately documented regarding PostgreSQL's enum limitations.

surfsense_backend/app/agents/researcher/nodes.py (1)

922-944: LGTM! Consistent integration following established connector patterns.

The Confluence connector integration correctly follows the same pattern used by other connectors in the research workflow. The method call, parameter passing, result handling, and streaming feedback are all implemented consistently with existing connectors like Jira.

surfsense_web/app/dashboard/[search_space_id]/connectors/add/confluence-connector/page.tsx (4)

36-60: Well-structured validation schema with good URL checking.

The Zod schema provides comprehensive validation for Confluence connector fields. The URL validation using refine() to check for "atlassian.net" or "confluence" is a good approach to ensure users enter valid Confluence instance URLs.

However, fix the formatting issue flagged by the pipeline by running prettier --write on this file.


84-111: Proper form submission implementation with good error handling.

The form submission logic correctly maps form fields to the expected backend configuration keys and includes appropriate error handling with user-friendly toast notifications.


113-324: Well-designed UI with professional user experience.

The component provides a clean, tabbed interface with proper form fields, loading states, and helpful documentation. The separation of the connection form and setup instructions improves usability.


280-316: Comprehensive and user-friendly documentation.

The documentation tab provides clear, actionable guidance for users setting up the Confluence connector. The explanations of indexed content, setup steps, and required permissions will help users successfully configure their connector.

surfsense_backend/app/routes/search_source_connectors_routes.py (3)

39-39: Proper import addition for Confluence indexing.

The import of index_confluence_pages follows the established pattern and is necessary for the new Confluence connector functionality.


461-474: Consistent implementation of Confluence indexing endpoint.

The addition of CONFLUENCE_CONNECTOR support follows the established pattern used by other connector types, with proper logging and background task scheduling.


905-960: Well-implemented helper functions following established patterns.

The Confluence indexing helper functions properly separate session management from the indexing logic, include appropriate error handling and logging, and follow the established pattern of updating the last_indexed_at timestamp only on successful completion.

surfsense_backend/app/tasks/connectors_indexing_tasks.py (1)

2539-2540: Add null checks for nested dictionary access

Accessing page["body"]["storage"]["value"] without checking if intermediate keys exist could raise KeyError.

Apply defensive programming:

-                if page.get("body") and page["body"].get("storage"):
-                    page_content = page["body"]["storage"].get("value", "")
+                body = page.get("body", {})
+                storage = body.get("storage", {})
+                page_content = storage.get("value", "")

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
surfsense_backend/app/connectors/confluence_connector.py (1)

295-355: Critical issue: Date filtering is not implemented.

The method accepts start_date and end_date parameters but never uses them to filter the results. All pages are returned regardless of the specified date range, which is a significant functional bug.

The Confluence API supports date filtering through query parameters. You need to implement the actual date filtering logic:

 def get_pages_by_date_range(
     self,
     start_date: str,
     end_date: str,
     space_ids: list[str] | None = None,
     include_comments: bool = True,
 ) -> tuple[list[dict[str, Any]], str | None]:
     """
     Fetch pages within a date range, optionally filtered by spaces.
     """
     try:
+        # Convert dates to ISO format for API
+        start_iso = f"{start_date}T00:00:00.000Z"
+        end_iso = f"{end_date}T23:59:59.999Z"
+        
         if space_ids:
             # Fetch pages from specific spaces
             for space_id in space_ids:
-                pages = self.get_pages_in_space(space_id, include_body=True)
+                # Add date filtering to the API call
+                params = {
+                    "limit": 100,
+                    "body-format": "storage",
+                    "created-date-from": start_iso,
+                    "created-date-to": end_iso,
+                }
+                # Implement paginated request with date filtering
                 all_pages.extend(pages)
         else:
             # Add date filtering parameters to the general pages query
             params = {
                 "limit": 100,
                 "body-format": "storage",
+                "created-date-from": start_iso,
+                "created-date-to": end_iso,
             }

Please verify the correct date filtering parameters in the Confluence API documentation.

🧹 Nitpick comments (2)
surfsense_backend/app/connectors/confluence_connector.py (2)

93-124: Consider adding retry logic for improved resilience.

The API request method has solid error handling and timeout configuration. Consider adding exponential backoff retry logic for transient failures to improve reliability in production environments.

 def make_api_request(
     self, endpoint: str, params: dict[str, Any] | None = None
 ) -> dict[str, Any]:
+    """
+    Make a request to the Confluence API with retry logic.
+    """
+    from time import sleep
+    import random
+    
+    max_retries = 3
+    for attempt in range(max_retries):
         try:
             response = requests.get(url, headers=headers, params=params, timeout=30)
             response.raise_for_status()
             return response.json()
         except requests.exceptions.RequestException as e:
+            if attempt == max_retries - 1:
                 raise Exception(f"Confluence API request failed: {e!s}") from e
+            # Exponential backoff with jitter
+            sleep_time = (2 ** attempt) + random.uniform(0, 1)
+            sleep(sleep_time)

125-168: LGTM! Solid pagination implementation with a minor suggestion.

The cursor-based pagination is handled correctly. Consider using urllib.parse for more robust URL parameter extraction instead of string splitting.

+from urllib.parse import urlparse, parse_qs

# In the cursor extraction logic:
-            if "cursor=" in next_link:
-                cursor = next_link.split("cursor=")[1].split("&")[0]
+            parsed_url = urlparse(next_link)
+            query_params = parse_qs(parsed_url.query)
+            cursor = query_params.get('cursor', [None])[0]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dee54bf and 14c99cd.

📒 Files selected for processing (3)
  • surfsense_backend/app/agents/researcher/nodes.py (1 hunks)
  • surfsense_backend/app/connectors/confluence_connector.py (1 hunks)
  • surfsense_backend/app/services/connector_service.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • surfsense_backend/app/agents/researcher/nodes.py
  • surfsense_backend/app/services/connector_service.py
🔇 Additional comments (6)
surfsense_backend/app/connectors/confluence_connector.py (6)

1-12: LGTM! Clean module structure and imports.

The module docstring clearly explains the purpose, and the imports are appropriate for the functionality.


17-34: Fix potential AttributeError with None base_url.

The code handles None values correctly, but there's good defensive programming with the trailing slash removal.

The initialization pattern with optional parameters allows for flexible instantiation, which is appropriate for this connector.


36-66: LGTM! Well-designed credential management.

The methods provide both bulk and individual credential setting with clear documentation and consistent implementation.


67-92: LGTM! Secure Basic Auth implementation.

The Basic Auth header construction is implemented correctly with proper base64 encoding and credential validation.


169-221: LGTM! Consistent implementation with space fetching.

The method follows the same solid pagination patterns. The same URL parsing improvement suggested for get_all_spaces() applies here as well.


222-294: LGTM! Well-structured comment fetching with good separation of concerns.

The use of a helper method for DRY principle and graceful handling of missing comments demonstrates good design practices.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
surfsense_web/hooks/useConnectorEditPage.ts (1)

89-94: Previous review feedback has been addressed.

The form reset logic now correctly includes the new Confluence and Jira fields, resolving the issue raised in the previous review.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14c99cd and 8786abf.

📒 Files selected for processing (5)
  • surfsense_backend/app/routes/search_source_connectors_routes.py (3 hunks)
  • surfsense_backend/app/schemas/search_source_connector.py (1 hunks)
  • surfsense_backend/app/services/connector_service.py (1 hunks)
  • surfsense_backend/app/tasks/connectors_indexing_tasks.py (2 hunks)
  • surfsense_web/hooks/useConnectorEditPage.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • surfsense_backend/app/schemas/search_source_connector.py
  • surfsense_backend/app/tasks/connectors_indexing_tasks.py
  • surfsense_backend/app/services/connector_service.py
  • surfsense_backend/app/routes/search_source_connectors_routes.py
🧰 Additional context used
🪛 Biome (2.1.2)
surfsense_web/hooks/useConnectorEditPage.ts

[error] 187-187: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 188-188: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 189-189: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 190-190: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 191-193: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Python Backend Quality
🔇 Additional comments (1)
surfsense_web/hooks/useConnectorEditPage.ts (1)

63-68: LGTM! Default values correctly added for new connector types.

The Confluence and Jira fields are properly initialized with empty string defaults, following the established pattern for other connector configurations.

@CREDO23 CREDO23 force-pushed the feature/add-confluence-Connector branch from 8786abf to 8e52a0b Compare July 27, 2025 12:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
surfsense_backend/app/tasks/connectors_indexing_tasks.py (1)

2548-2557: Add null checks for comment data access

The code accesses nested dictionary keys without proper null checking, which could cause KeyError or AttributeError if the expected structure is missing.

Apply defensive programming for comment processing:

-                        if comment.get("body") and comment["body"].get("storage"):
-                            comment_body = comment["body"]["storage"].get("value", "")
+                        comment_body_data = comment.get("body", {})
+                        comment_storage = comment_body_data.get("storage", {})
+                        comment_body = comment_storage.get("value", "")

-                        comment_author = comment.get("version", {}).get(
-                            "authorId", "Unknown"
-                        )
-                        comment_date = comment.get("version", {}).get("createdAt", "")
+                        version_data = comment.get("version", {})
+                        comment_author = version_data.get("authorId", "Unknown")
+                        comment_date = version_data.get("createdAt", "")
🧹 Nitpick comments (1)
surfsense_backend/app/connectors/confluence_connector.py (1)

143-166: Refactor pagination logic to reduce code duplication

The pagination logic is repeated across multiple methods. Consider extracting it into a helper method to follow the DRY principle.

def _paginate_api_request(self, endpoint: str, params: dict[str, Any] | None = None) -> list[dict[str, Any]]:
    """
    Helper method to handle paginated API requests.
    
    Args:
        endpoint: API endpoint
        params: Initial query parameters
        
    Returns:
        List of all results from paginated requests
    """
    if params is None:
        params = {}
    
    all_results = []
    cursor = None
    
    while True:
        if cursor:
            params["cursor"] = cursor
            
        result = self.make_api_request(endpoint, params)
        
        if not isinstance(result, dict) or "results" not in result:
            raise Exception("Invalid response from Confluence API")
            
        all_results.extend(result["results"])
        
        # Check for next page
        links = result.get("_links", {})
        if "next" not in links:
            break
            
        # Extract cursor more safely
        import urllib.parse
        next_url = links["next"]
        parsed = urllib.parse.urlparse(next_url)
        query_params = urllib.parse.parse_qs(parsed.query)
        cursor = query_params.get("cursor", [None])[0]
        
        if not cursor:
            break
            
    return all_results

Also applies to: 197-218, 269-291, 330-349

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8786abf and 8e52a0b.

📒 Files selected for processing (7)
  • surfsense_backend/app/agents/researcher/nodes.py (1 hunks)
  • surfsense_backend/app/connectors/confluence_connector.py (1 hunks)
  • surfsense_backend/app/routes/search_source_connectors_routes.py (3 hunks)
  • surfsense_backend/app/schemas/search_source_connector.py (1 hunks)
  • surfsense_backend/app/services/connector_service.py (1 hunks)
  • surfsense_backend/app/tasks/connectors_indexing_tasks.py (2 hunks)
  • surfsense_web/hooks/useConnectorEditPage.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • surfsense_backend/app/schemas/search_source_connector.py
  • surfsense_backend/app/agents/researcher/nodes.py
  • surfsense_backend/app/routes/search_source_connectors_routes.py
  • surfsense_backend/app/services/connector_service.py
  • surfsense_web/hooks/useConnectorEditPage.ts
🔇 Additional comments (2)
surfsense_backend/app/tasks/connectors_indexing_tasks.py (2)

11-11: LGTM!

The import is correctly placed in alphabetical order with other connector imports.


2335-2358: LGTM!

The function signature and documentation are well-structured and follow the same pattern as other indexing functions in the file.

@MODSetter
Copy link
Owner

@CREDO23 Comments on the confluence pages are not getting picked up. Try to fix that in new PR. Other than that nice work.

@MODSetter MODSetter merged commit fbb1263 into MODSetter:main Jul 28, 2025
5 of 7 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Aug 8, 2025
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants