Implementing a Packrat Parser in Python
Packrat parsing is a powerful technique for parsing context-free grammars, particularly those with ambiguous or left-recursive rules, without resorting to traditional backtracking. It achieves this by memoizing the results of parsing subexpressions, allowing it to efficiently explore multiple possibilities and recover from failures. This challenge asks you to implement a basic packrat parser in Python to demonstrate this technique.
Problem Description
You are tasked with implementing a packrat parser for a simple arithmetic expression grammar. The grammar consists of the following rules:
expression : term ((PLUS | MINUS) term)*term : factor ((MUL | DIV) factor)*factor : NUMBER | LPAREN expression RPAREN
Where:
PLUS,MINUS,MUL,DIV,LPAREN,RPARENare terminal symbols representing the respective operators and parentheses.NUMBERis a terminal symbol representing a numerical value (integer).
The parser should take a string representing the arithmetic expression as input and return the result of evaluating the expression. The parser should handle operator precedence correctly (multiplication and division before addition and subtraction). The packrat parsing strategy requires memoizing the results of parsing subexpressions to avoid redundant computations.
Key Requirements:
- Implement the
PackratParserclass with aparsemethod. - The
parsemethod should accept a string representing the expression and return the evaluated result (an integer). - The parser should use a memoization table (dictionary) to store the results of parsing subexpressions. The key for the memoization table should be a tuple
(start_index, rule_index), wherestart_indexis the starting index of the subexpression andrule_indexis the index of the rule being applied. - The parser should handle invalid expressions gracefully (e.g., by raising an exception or returning a specific error value).
- The parser should correctly handle operator precedence.
Expected Behavior:
The parser should correctly evaluate arithmetic expressions according to the defined grammar and operator precedence. It should leverage memoization to improve performance, especially for expressions with overlapping subexpressions.
Edge Cases to Consider:
- Empty input string.
- Expressions with only a single number.
- Expressions with nested parentheses.
- Expressions with multiple operators of the same precedence (e.g.,
2 + 3 * 4). - Invalid characters in the input string.
- Division by zero.
Examples
Example 1:
Input: "1 + 2 * 3"
Output: 7
Explanation: The expression is evaluated as 1 + (2 * 3) = 1 + 6 = 7.
Example 2:
Input: "(1 + 2) * 3"
Output: 9
Explanation: The expression is evaluated as ((1 + 2) * 3) = (3 * 3) = 9.
Example 3:
Input: "10 / (2 + 3)"
Output: 2
Explanation: The expression is evaluated as 10 / (2 + 3) = 10 / 5 = 2.
Example 4:
Input: "5"
Output: 5
Explanation: A single number is simply returned.
Constraints
- The input string will consist of digits, operators (
+,-,*,/), parentheses ((,)), and whitespace. - The input string will represent a valid arithmetic expression according to the grammar, or an expression that can be reasonably parsed.
- The parser should be able to handle integers in the range of -1000 to 1000.
- The parser should avoid infinite recursion or stack overflow errors, even for complex expressions. Memoization is key to achieving this.
- The parser should handle division by zero by raising a
ZeroDivisionError.
Notes
- You can represent terminal symbols as strings (e.g., "PLUS", "NUMBER").
- Consider using a recursive descent approach for parsing, but with memoization.
- The memoization table should be a dictionary where the key is a tuple
(start_index, rule_index)and the value is the result of parsing that subexpression. - Think carefully about how to handle backtracking and failure in the presence of memoization. The packrat parser doesn't explicitly backtrack; it reuses previously computed results.
- Error handling is important. Consider raising exceptions for invalid input or division by zero.
- Whitespace should be ignored.
- The
rule_indexis used to distinguish between different parsing rules that might start at the same index. This is crucial for memoization.