Hone logo
Hone
Problems

Building a Python Parser Combinator Library

Parser combinators are a powerful technique for building parsers by composing smaller, simpler parsers. This challenge asks you to implement a basic parser combinator library in Python. This is a fundamental building block for language processing, data parsing, and creating domain-specific languages.

Problem Description

You need to create a Python library that allows users to define parsers by combining primitive parsers. A parser should take an input string and return a result (or indicate failure). The core idea is to build complex parsers from simple ones.

Key Requirements:

  • Parser Class: Define a base Parser class.
  • parse(self, input_string) method: Each parser instance must have a parse method. This method should return a tuple (result, remaining_string) on success, where result is the parsed value and remaining_string is the portion of the input string that was not consumed. On failure, it should raise a ParseError (which you'll also need to define).
  • Primitive Parsers: Implement several basic parsers:
    • Literal(string): Matches a specific literal string.
    • Regex(pattern): Matches a regular expression pattern.
    • AnyChar(): Matches any single character.
  • Combinator Parsers: Implement combinators that combine existing parsers:
    • Sequence(*parsers): Matches a sequence of parsers. Returns a list of results.
    • Choice(*parsers): Matches the first parser that succeeds. Returns the result of the successful parser.
    • Many(parser): Matches the parser zero or more times. Returns a list of results.
    • Optional(parser): Matches the parser zero or one time. If it matches, return the result; otherwise, return None.
  • ParseError Exception: Define a custom exception ParseError to be raised on parsing failure. It should ideally include information about the input and what was expected.

Expected Behavior:

  • Successful parsing should yield a meaningful result and the remaining unparsed input.
  • Failed parsing should consistently raise ParseError.
  • Combinators should correctly compose the behavior of their constituent parsers.

Edge Cases to Consider:

  • Empty input strings.
  • Parsers that consume no input (e.g., Many on an empty sequence).
  • Overlapping matches for Choice.
  • Empty sequences of parsers for Sequence and Choice.

Examples

Example 1: Literal and Sequence

from parser_combinator import Literal, Sequence, ParseError

# Assume Literal and Sequence are implemented correctly

text = "hello world"
literal_parser = Literal("hello")
sequence_parser = Sequence(Literal("hello"), Literal(" "), Literal("world"))

# Test Literal
result, remaining = literal_parser.parse(text)
print(f"Literal result: {result}, Remaining: '{remaining}'")
# Expected Output: Literal result: hello, Remaining: ' world'

# Test Sequence
result_seq, remaining_seq = sequence_parser.parse(text)
print(f"Sequence result: {result_seq}, Remaining: '{remaining_seq}'")
# Expected Output: Sequence result: ['hello', ' ', 'world'], Remaining: ''

# Test Sequence failure
try:
    sequence_parser.parse("hello python")
except ParseError as e:
    print(f"Sequence failure: {e}")
# Expected Output: Sequence failure: Failed to parse at ...

Example 2: Choice and AnyChar

from parser_combinator import Choice, AnyChar, ParseError, Literal

# Assume Choice, AnyChar, Literal are implemented correctly

text = "abc"
any_char_parser = AnyChar()
choice_parser = Choice(Literal("a"), Literal("b"))

# Test AnyChar
result_ac, remaining_ac = any_char_parser.parse(text)
print(f"AnyChar result: {result_ac}, Remaining: '{remaining_ac}'")
# Expected Output: AnyChar result: a, Remaining: 'bc'

# Test Choice
result_choice, remaining_choice = choice_parser.parse(text)
print(f"Choice result: {result_choice}, Remaining: '{remaining_choice}'")
# Expected Output: Choice result: a, Remaining: 'bc'

# Test Choice failure
try:
    choice_parser.parse("xyz")
except ParseError as e:
    print(f"Choice failure: {e}")
# Expected Output: Choice failure: Failed to parse at ...

Example 3: Many and Optional

from parser_combinator import Many, Optional, Literal, ParseError

# Assume Many, Optional, Literal are implemented correctly

text = "aaabbc"
many_a_parser = Many(Literal("a"))
optional_b_parser = Optional(Literal("b"))
combined_parser = Sequence(many_a_parser, optional_b_parser, Literal("c"))

# Test Many
result_many, remaining_many = many_a_parser.parse(text)
print(f"Many result: {result_many}, Remaining: '{remaining_many}'")
# Expected Output: Many result: ['a', 'a', 'a'], Remaining: 'bbc'

# Test Optional success
result_opt_s, remaining_opt_s = optional_b_parser.parse("bc")
print(f"Optional (success) result: {result_opt_s}, Remaining: '{remaining_opt_s}'")
# Expected Output: Optional (success) result: b, Remaining: 'c'

# Test Optional failure
result_opt_f, remaining_opt_f = optional_b_parser.parse("c")
print(f"Optional (failure) result: {result_opt_f}, Remaining: '{remaining_opt_f}'")
# Expected Output: Optional (failure) result: None, Remaining: 'c'

# Test Combined
result_combined, remaining_combined = combined_parser.parse(text)
print(f"Combined result: {result_combined}, Remaining: '{remaining_combined}'")
# Expected Output: Combined result: [['a', 'a', 'a'], 'b', 'c'], Remaining: ''

# Test Combined with missing optional
text_no_b = "aac"
result_combined_no_b, remaining_combined_no_b = combined_parser.parse(text_no_b)
print(f"Combined (no optional) result: {result_combined_no_b}, Remaining: '{remaining_combined_no_b}'")
# Expected Output: Combined (no optional) result: [['a', 'a'], None, 'c'], Remaining: ''

# Test Combined with failure
try:
    combined_parser.parse("aaad")
except ParseError as e:
    print(f"Combined failure: {e}")
# Expected Output: Combined failure: Failed to parse at ...

Constraints

  • The implementation should be in pure Python, without relying on external parsing libraries.
  • The parse method should have a time complexity that is generally proportional to the length of the input string and the complexity of the parser itself. Avoid exponential complexity where possible.
  • The input strings will consist of ASCII characters.

Notes

  • Consider how to handle backtracking. Most combinator libraries implicitly handle backtracking when Choice or Optional are used.
  • Think about how to represent the state of the parsing process (current position in the input string).
  • The ParseError should ideally provide context about where the parse failed and potentially what was expected. You might want to pass the current input string and the index of failure to the ParseError constructor.
  • Consider making parsers callable directly (e.g., parser(input_string)). This can be achieved by implementing the __call__ method on your Parser class.
Loading editor...
python