Simple Expression Simplifier: Optimization Passes in Rust
This challenge focuses on implementing basic optimization passes for a simple expression evaluator written in Rust. The goal is to take a string representation of a mathematical expression and apply a series of transformations to simplify it, demonstrating fundamental compiler optimization techniques. This exercise will help you understand how compilers analyze and transform code to improve performance or reduce size.
Problem Description
You are tasked with creating a module that can simplify mathematical expressions represented as strings. The expressions will consist of integers, addition (+), subtraction (-), multiplication (*), and division (/). Your module should implement two optimization passes:
- Constant Folding: Evaluate any sub-expressions where all operands are constants (integers). Replace the sub-expression with its result. For example, "2 + 3 * 4" should become "14".
- Common Subexpression Elimination (CSE): Identify and eliminate redundant calculations. If the same sub-expression appears multiple times, calculate it once and reuse the result. For simplicity, assume that variable names are not present; only constant values are considered.
The input will be a string representing the expression. The output should be a simplified string representation of the same expression. The order of operations (PEMDAS/BODMAS) should be respected during simplification. You are not required to parse the expression into an Abstract Syntax Tree (AST) for this challenge; you can work directly with the string representation, but you must respect operator precedence.
Key Requirements:
- Implement both Constant Folding and CSE optimization passes.
- The simplification should respect operator precedence (multiplication and division before addition and subtraction).
- The output string should be a valid mathematical expression.
- Handle potential division by zero gracefully (return the original expression if division by zero is detected).
Expected Behavior:
The module should take a string as input and return a simplified string. The simplification should be deterministic; the same input should always produce the same output.
Edge Cases to Consider:
- Empty input string.
- Expressions with only a single number.
- Expressions with multiple consecutive operators (e.g., "2 + + 3"). These should be treated as invalid and the original expression returned.
- Division by zero.
- Expressions with leading or trailing whitespace.
- Expressions with parentheses (not required for this challenge, but consider how your solution would handle them if they were present).
Examples
Example 1:
Input: "2 + 3 * 4"
Output: "14"
Explanation: Constant folding: 3 * 4 = 12, then 2 + 12 = 14.
Example 2:
Input: "2 * (3 + 4) * 5"
Output: "70"
Explanation: Constant folding: 3 + 4 = 7, then 2 * 7 * 5 = 70.
Example 3:
Input: "2 * 3 + 2 * 3"
Output: "12"
Explanation: CSE: 2 * 3 is calculated once and reused.
Example 4:
Input: "10 / 0"
Output: "10 / 0"
Explanation: Division by zero is detected, and the original expression is returned.
Example 5:
Input: " 2 + 3 "
Output: "5"
Explanation: Leading and trailing whitespace are ignored.
Constraints
- The input string will contain only integers, '+', '-', '*', '/', and whitespace.
- Integers will be non-negative and will not exceed 1000.
- The length of the input string will not exceed 256 characters.
- The simplification process should be reasonably efficient. While a full AST is not required, avoid excessively complex string manipulation that could lead to poor performance. A reasonable target is to complete simplification within 100ms for typical expressions.
Notes
- You can use regular expressions or other string manipulation techniques to parse and simplify the expression.
- Consider breaking down the problem into smaller, manageable functions for each optimization pass.
- Testing is crucial. Create a comprehensive set of test cases to ensure your solution handles various scenarios correctly.
- While parentheses are not required, think about how your approach could be extended to handle them in the future. A recursive approach might be beneficial for that.
- Error handling is limited to division by zero. Invalid expressions (e.g., "2 + * 3") should simply return the original expression.