Hone logo
Hone
Problems

Streaming CSV Parser in JavaScript

Building a streaming parser is crucial for handling large datasets that don't fit into memory. This challenge asks you to create a JavaScript function that parses a CSV (Comma Separated Values) stream incrementally, yielding parsed rows as they become available, rather than waiting for the entire file to be loaded. This is particularly useful for processing large log files or data feeds.

Problem Description

You need to implement a createCsvStreamParser function that takes a string as input, representing a CSV stream. The function should return an iterator that yields JavaScript objects, where each object represents a row in the CSV. The iterator should process the CSV stream incrementally, yielding a row as soon as it's fully parsed.

Key Requirements:

  • Incremental Parsing: The parser should process the input string chunk by chunk, yielding rows as they are complete.
  • CSV Format: The CSV should use commas (,) as delimiters. Assume the first row is the header row.
  • Quoted Fields: Fields can be enclosed in double quotes ("). Quotes within a field must be properly escaped (doubled quotes: "").
  • Line Breaks: Rows are separated by newline characters (\n).
  • Error Handling: The parser should gracefully handle incomplete lines at the end of the stream. Incomplete lines should not be yielded.
  • Header Row: The first row should be treated as the header row, and the keys of the returned objects should correspond to the header values.

Expected Behavior:

The createCsvStreamParser function should return an iterator. Each iteration of the iterator should yield a JavaScript object representing a parsed CSV row. The keys of the object should be the header values from the first row of the CSV.

Edge Cases to Consider:

  • Empty input string.
  • CSV with only a header row.
  • CSV with empty fields.
  • CSV with escaped double quotes within fields.
  • Incomplete lines at the end of the stream.
  • Lines with only commas.
  • Lines with leading/trailing whitespace.

Examples

Example 1:

Input: `header1,header2\nvalue1,value2\nvalue3,value4`
Output:
{ header1: 'value1', header2: 'value2' }
{ header1: 'value3', header2: 'value4' }
Explanation: The input is a simple CSV with two rows. The first row is the header, and the subsequent rows are data rows. The output is an iterator yielding two objects, each representing a row.

Example 2:

Input: `header1,header2\n"value1,with,comma",value2\nvalue3,"value4,with,escaped ""quotes"""`
Output:
{ header1: 'value1,with,comma', header2: 'value2' }
{ header1: 'value3', header2: 'value4,with,escaped "quotes"' }
Explanation: This example demonstrates handling commas within quoted fields and escaped double quotes.

Example 3:

Input: `header1,header2\nvalue1,value2\n`
Output:
{ header1: 'value1', header2: 'value2' }
Explanation: This example shows how the parser handles an incomplete line at the end of the stream. The incomplete line is ignored.

Constraints

  • The input CSV string can be up to 10MB in size.
  • The number of columns in the CSV is limited to 100.
  • The length of each field (including quoted fields) is limited to 255 characters.
  • The parser should be reasonably efficient, processing a typical CSV row in under 10ms.
  • Input will always be a string.

Notes

  • Consider using a state machine approach to track the parsing state (e.g., inside a quoted field, outside a quoted field, parsing a header row, parsing a data row).
  • You can use String.prototype.split() to split the CSV string into lines.
  • Pay close attention to the handling of quoted fields and escaped double quotes.
  • The iterator should be implemented using a generator function (using the function* syntax).
  • Whitespace around commas should be trimmed.
  • The header row should be treated as case-sensitive.
Loading editor...
javascript