Hone logo
Hone
Problems

Python CSV Reader

This challenge focuses on building a robust CSV (Comma Separated Values) reader in Python. CSV is a common format for storing tabular data, and being able to parse it effectively is a fundamental skill for data processing and analysis. You will create a function that takes a CSV string as input and returns a structured representation of the data.

Problem Description

Your task is to implement a Python function, read_csv(csv_string), that parses a given string containing CSV data. The function should handle standard CSV conventions, including commas as delimiters, double quotes for enclosing fields that may contain commas or newlines, and doubling double quotes within a quoted field to represent a literal double quote.

Key Requirements:

  1. Input: The function will accept a single argument: csv_string, a multiline string representing the CSV data.
  2. Output: The function should return a list of lists, where each inner list represents a row and contains the fields of that row as strings.
  3. Delimiter: Fields are separated by commas (,).
  4. Quoting: Fields containing commas, newlines, or double quotes must be enclosed in double quotes (").
  5. Escaped Quotes: A literal double quote within a quoted field should be represented by two consecutive double quotes ("").
  6. Header Row (Optional): The first row of the CSV may be a header row. Your parser should treat it the same way as any other data row.

Expected Behavior:

  • The parser should correctly identify fields separated by commas.
  • It should correctly handle fields that are enclosed in double quotes, even if they contain internal commas or newlines.
  • It should correctly interpret escaped double quotes ("") within quoted fields.
  • Empty fields should be represented as empty strings ("").
  • Rows should be separated by newline characters (\n).

Edge Cases to Consider:

  • CSV with only a header row.
  • CSV with empty rows.
  • CSV with fields containing leading/trailing whitespace (consider if this should be trimmed or preserved – for this challenge, preserve it).
  • CSV with very long lines or many fields.
  • CSV with no quoted fields.
  • CSV with only quoted fields.
  • CSV ending with a newline.

Examples

Example 1:

Input:
'''Name,Age,City
Alice,30,"New York"
Bob,25,London'''

Output:
[['Name', 'Age', 'City'], ['Alice', '30', 'New York'], ['Bob', '25', 'London']]

Explanation:
This is a standard CSV with a header. Each row is correctly parsed into its constituent fields.

Example 2:

Input:
'''Item,Description,Price
"Laptop","Powerful, with 16GB RAM",1200.50
"Mouse","Ergonomic ""wireless"" mouse",25.99'''

Output:
[['Item', 'Description', 'Price'], ['Laptop', 'Powerful, with 16GB RAM', '1200.50'], ['Mouse', 'Ergonomic "wireless" mouse', '25.99']]

Explanation:
The 'Description' field for "Laptop" contains a comma, so it's quoted. The 'Description' field for "Mouse" contains an escaped double quote (`""`), which is correctly parsed as a single literal double quote.

Example 3:

Input:
'''ID,Status
1,"Pending, requires review"
2,Completed
3,""""'''

Output:
[['ID', 'Status'], ['1', 'Pending, requires review'], ['2', 'Completed'], ['3', '"']]

Explanation:
This example shows a quoted field with a comma and a quoted field containing only escaped quotes, resulting in a single quote. An empty field would be parsed as `''`.

Constraints

  • The input csv_string will be a string.
  • The csv_string will consist of characters representable in Python strings.
  • The maximum number of rows in the CSV is 10,000.
  • The maximum number of fields per row is 100.
  • The maximum length of any single field is 1000 characters.
  • The function should aim for reasonable performance, avoiding excessively slow parsing for typical inputs within the given constraints.

Notes

  • You are not expected to use Python's built-in csv module for this challenge. The goal is to implement the parsing logic yourself.
  • Consider how to handle newline characters within quoted fields. They should be treated as part of the field's content, not as row separators.
  • Think about the state of your parser as it iterates through the characters of the csv_string. You'll likely need to track whether you are currently inside a quoted field.
Loading editor...
python