Skip to content

Host counts can be off by +1 (higher than actual) #14

@handecelikkanat

Description

@handecelikkanat

cc-crawl-statistics sometimes can report host counts as one more than actual number. This behavior is sporadic and doesnt always happen.

Example:

In domains-top-500.csv for CC-MAIN-2025-30:

domain actual host count host count as reported by cc-crawl-statistics
wikipedia.org 671 672: buggy report
pinterest.com 50 50: correctly reported

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions