Skip to content

Fix CSV parsing to use proper csv.reader() instead of naive string splitting #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 19, 2025

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Aug 18, 2025

The parse_csv_raw() function was using naive string splitting to parse CSV headers, which doesn't handle CSV edge cases properly. This caused incorrect parsing when CSV files contained quoted fields with commas, escaped quotes, or other standard CSV formatting.

Problem

The current implementation used:

column_names = header.strip().split(col_delimiter)

This fails for CSV files like:

f1,"field with, comma",f3,"field with ""quotes"""

The naive splitting would incorrectly parse this as 6 columns instead of 4:

['f1', '"field with', ' comma"', 'f3', '"field with ""quotes"""']

Solution

Updated the function to use Python's standard csv module:

column_names = list(csv.reader([header.strip()]))[0]

This properly handles all CSV edge cases and parses the same header correctly as:

['f1', 'field with, comma', 'f3', 'field with "quotes"']

Changes

  • Fixed parse_csv_raw() function in outrank/core_utils.py (line 396)
  • Added comprehensive test case test_parse_csv_with_quoted_fields() in tests/data_io_test.py

The fix is surgical and minimal - only one line changed. The csv module was already imported and used elsewhere in the codebase (parse_ob_csv_line() function), so this change brings consistency across all CSV parsing methods.

All existing tests pass and the new test case verifies that complex CSV files with quoted fields are now parsed correctly.

Fixes #103.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@Copilot Copilot AI changed the title [WIP] Proper csv reader Fix CSV parsing to use proper csv.reader() instead of naive string splitting Aug 18, 2025
@Copilot Copilot AI requested a review from SkBlaz August 18, 2025 17:41
Copilot finished work on behalf of SkBlaz August 18, 2025 17:41
@SkBlaz SkBlaz marked this pull request as ready for review August 19, 2025 06:20
@SkBlaz SkBlaz merged commit 5642653 into main Aug 19, 2025
8 checks passed
@SkBlaz SkBlaz deleted the copilot/fix-103 branch August 19, 2025 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proper csv reader
2 participants