Building a Python Parser Combinator Library
Parser combinators are a powerful technique for building parsers by composing smaller, simpler parsers. This challenge asks you to implement a basic parser combinator library in Python. This is a fundamental building block for language processing, data parsing, and creating domain-specific languages.
Problem Description
You need to create a Python library that allows users to define parsers by combining primitive parsers. A parser should take an input string and return a result (or indicate failure). The core idea is to build complex parsers from simple ones.
Key Requirements:
ParserClass: Define a baseParserclass.parse(self, input_string)method: Each parser instance must have aparsemethod. This method should return a tuple(result, remaining_string)on success, whereresultis the parsed value andremaining_stringis the portion of the input string that was not consumed. On failure, it should raise aParseError(which you'll also need to define).- Primitive Parsers: Implement several basic parsers:
Literal(string): Matches a specific literal string.Regex(pattern): Matches a regular expression pattern.AnyChar(): Matches any single character.
- Combinator Parsers: Implement combinators that combine existing parsers:
Sequence(*parsers): Matches a sequence of parsers. Returns a list of results.Choice(*parsers): Matches the first parser that succeeds. Returns the result of the successful parser.Many(parser): Matches theparserzero or more times. Returns a list of results.Optional(parser): Matches theparserzero or one time. If it matches, return the result; otherwise, returnNone.
ParseErrorException: Define a custom exceptionParseErrorto be raised on parsing failure. It should ideally include information about the input and what was expected.
Expected Behavior:
- Successful parsing should yield a meaningful result and the remaining unparsed input.
- Failed parsing should consistently raise
ParseError. - Combinators should correctly compose the behavior of their constituent parsers.
Edge Cases to Consider:
- Empty input strings.
- Parsers that consume no input (e.g.,
Manyon an empty sequence). - Overlapping matches for
Choice. - Empty sequences of parsers for
SequenceandChoice.
Examples
Example 1: Literal and Sequence
from parser_combinator import Literal, Sequence, ParseError
# Assume Literal and Sequence are implemented correctly
text = "hello world"
literal_parser = Literal("hello")
sequence_parser = Sequence(Literal("hello"), Literal(" "), Literal("world"))
# Test Literal
result, remaining = literal_parser.parse(text)
print(f"Literal result: {result}, Remaining: '{remaining}'")
# Expected Output: Literal result: hello, Remaining: ' world'
# Test Sequence
result_seq, remaining_seq = sequence_parser.parse(text)
print(f"Sequence result: {result_seq}, Remaining: '{remaining_seq}'")
# Expected Output: Sequence result: ['hello', ' ', 'world'], Remaining: ''
# Test Sequence failure
try:
sequence_parser.parse("hello python")
except ParseError as e:
print(f"Sequence failure: {e}")
# Expected Output: Sequence failure: Failed to parse at ...
Example 2: Choice and AnyChar
from parser_combinator import Choice, AnyChar, ParseError, Literal
# Assume Choice, AnyChar, Literal are implemented correctly
text = "abc"
any_char_parser = AnyChar()
choice_parser = Choice(Literal("a"), Literal("b"))
# Test AnyChar
result_ac, remaining_ac = any_char_parser.parse(text)
print(f"AnyChar result: {result_ac}, Remaining: '{remaining_ac}'")
# Expected Output: AnyChar result: a, Remaining: 'bc'
# Test Choice
result_choice, remaining_choice = choice_parser.parse(text)
print(f"Choice result: {result_choice}, Remaining: '{remaining_choice}'")
# Expected Output: Choice result: a, Remaining: 'bc'
# Test Choice failure
try:
choice_parser.parse("xyz")
except ParseError as e:
print(f"Choice failure: {e}")
# Expected Output: Choice failure: Failed to parse at ...
Example 3: Many and Optional
from parser_combinator import Many, Optional, Literal, ParseError
# Assume Many, Optional, Literal are implemented correctly
text = "aaabbc"
many_a_parser = Many(Literal("a"))
optional_b_parser = Optional(Literal("b"))
combined_parser = Sequence(many_a_parser, optional_b_parser, Literal("c"))
# Test Many
result_many, remaining_many = many_a_parser.parse(text)
print(f"Many result: {result_many}, Remaining: '{remaining_many}'")
# Expected Output: Many result: ['a', 'a', 'a'], Remaining: 'bbc'
# Test Optional success
result_opt_s, remaining_opt_s = optional_b_parser.parse("bc")
print(f"Optional (success) result: {result_opt_s}, Remaining: '{remaining_opt_s}'")
# Expected Output: Optional (success) result: b, Remaining: 'c'
# Test Optional failure
result_opt_f, remaining_opt_f = optional_b_parser.parse("c")
print(f"Optional (failure) result: {result_opt_f}, Remaining: '{remaining_opt_f}'")
# Expected Output: Optional (failure) result: None, Remaining: 'c'
# Test Combined
result_combined, remaining_combined = combined_parser.parse(text)
print(f"Combined result: {result_combined}, Remaining: '{remaining_combined}'")
# Expected Output: Combined result: [['a', 'a', 'a'], 'b', 'c'], Remaining: ''
# Test Combined with missing optional
text_no_b = "aac"
result_combined_no_b, remaining_combined_no_b = combined_parser.parse(text_no_b)
print(f"Combined (no optional) result: {result_combined_no_b}, Remaining: '{remaining_combined_no_b}'")
# Expected Output: Combined (no optional) result: [['a', 'a'], None, 'c'], Remaining: ''
# Test Combined with failure
try:
combined_parser.parse("aaad")
except ParseError as e:
print(f"Combined failure: {e}")
# Expected Output: Combined failure: Failed to parse at ...
Constraints
- The implementation should be in pure Python, without relying on external parsing libraries.
- The
parsemethod should have a time complexity that is generally proportional to the length of the input string and the complexity of the parser itself. Avoid exponential complexity where possible. - The input strings will consist of ASCII characters.
Notes
- Consider how to handle backtracking. Most combinator libraries implicitly handle backtracking when
ChoiceorOptionalare used. - Think about how to represent the state of the parsing process (current position in the input string).
- The
ParseErrorshould ideally provide context about where the parse failed and potentially what was expected. You might want to pass the current input string and the index of failure to theParseErrorconstructor. - Consider making parsers callable directly (e.g.,
parser(input_string)). This can be achieved by implementing the__call__method on yourParserclass.