Robust Data Processing with Error Recovery
This challenge focuses on building a Python function that processes a list of potentially malformed data entries. The goal is to gracefully handle errors during processing, recover from them, and still produce as much valid output as possible. This is a crucial skill for real-world applications dealing with messy or unreliable data sources.
Problem Description
You need to implement a Python function called process_data that takes a list of strings as input. Each string represents a data entry that should ideally be parsed into a specific format (e.g., a tuple of integers). However, some entries might be malformed and cause errors during parsing.
Your function should:
- Attempt to parse each string: For each string in the input list, try to convert it into a tuple of two integers. The expected format is
"integer1,integer2". - Handle parsing errors: If a string cannot be parsed correctly (e.g., it's not in the expected format, contains non-numeric characters, or is missing a comma), catch the relevant exceptions.
- Log errors: When an error occurs, log the problematic input string and the type of error encountered.
- Return valid data: Collect all successfully parsed tuples of integers.
- Return error information: Provide a list of all encountered errors, detailing the input that caused the error and the exception type.
Expected Behavior:
The process_data function should return two values:
* A list of successfully parsed tuples (e.g., [(1, 2), (3, 4)]).
* A list of dictionaries, where each dictionary represents an error and contains keys like "input" (the original string) and "error_type" (the exception class name).
Edge Cases to Consider:
- Empty input list.
- Strings with extra whitespace.
- Strings with only one number.
- Strings with non-numeric characters.
- Strings with floating-point numbers.
- Strings with more than two numbers.
Examples
Example 1:
Input: ["1,2", "3,4", "5,6"]
Output: ([(1, 2), (3, 4), (5, 6)], [])
Explanation: All input strings are valid and successfully parsed into tuples of integers. No errors are encountered.
Example 2:
Input: ["10,20", "invalid_entry", "30, 40", "50,abc"]
Output: ([(10, 20), (30, 40)], [{'input': 'invalid_entry', 'error_type': 'ValueError'}, {'input': '50,abc', 'error_type': 'ValueError'}])
Explanation: "10,20" and "30, 40" are parsed successfully. "invalid_entry" and "50,abc" cause ValueError exceptions and are logged as errors. Note that the space in "30, 40" is handled by default string splitting.
Example 3:
Input: ["1,", ",2", "1,2,3", "1.5,2.5"]
Output: ([], [{'input': '1,', 'error_type': 'ValueError'}, {'input': ',2', 'error_type': 'ValueError'}, {'input': '1,2,3', 'error_type': 'ValueError'}, {'input': '1.5,2.5', 'error_type': 'ValueError'}])
Explanation: All entries in this example are malformed and result in ValueErrors during parsing. The function correctly identifies and logs all of them.
Constraints
- The input will be a list of strings.
- Each string, if valid, will contain two integers separated by a comma. Leading/trailing whitespace around numbers or the comma should be handled.
- The maximum length of any input string will not exceed 100 characters.
- The number of entries in the input list will not exceed 1000.
Notes
- You'll likely need to use
try-exceptblocks to handle potentialValueErroror other exceptions during string splitting and integer conversion. - Consider using
.strip()to handle leading/trailing whitespace. - The focus is on robust error handling and recovery, not on optimizing for extreme performance (though efficient code is always good practice).