Python Compliance Checker
In many real-world applications, it's crucial to ensure that data adheres to specific rules and regulations. This challenge focuses on building a flexible system in Python to perform various compliance checks on a given dataset. This is a fundamental skill for data validation, security, and regulatory adherence.
Problem Description
Your task is to implement a Python class, ComplianceChecker, that can be configured with a set of compliance rules. The ComplianceChecker should be able to process a dictionary representing a record and determine if it violates any of the configured rules.
Key Requirements:
- Rule Definition: Rules should be defined as functions that accept a record (dictionary) and return
Trueif the rule is satisfied, andFalseotherwise. - Checker Class: Create a
ComplianceCheckerclass.- The constructor should accept an iterable of rule functions.
- It should have a method
check(record: dict) -> list[str]that takes a record and returns a list of strings. Each string in the list should represent a rule that was violated. If no rules are violated, the list should be empty.
- Rule Violation Reporting: The
checkmethod should clearly indicate which rules failed. For simplicity, you can assume each rule function has a__name__attribute that can be used as its identifier (e.g.,rule_is_positive,rule_max_length). - Flexibility: The system should be able to handle different types of checks (e.g., value range, string length, format validation, presence of keys).
Expected Behavior:
When check(record) is called:
- It iterates through all the configured rule functions.
- For each rule, it calls the rule function with the
record. - If a rule function returns
False, the name of that rule function is added to a list of violations. - Finally, it returns the list of violated rule names.
Edge Cases:
- An empty record (
{}). - A record with missing keys that some rules expect.
- A record with keys having unexpected data types.
- The checker being initialized with no rules.
Examples
Example 1:
def is_positive(record):
return record.get("value", 0) > 0
def has_required_fields(record):
required = {"id", "name", "value"}
return required.issubset(record.keys())
checker = ComplianceChecker([is_positive, has_required_fields])
record_valid = {"id": 1, "name": "Item A", "value": 10}
violations_valid = checker.check(record_valid)
print(violations_valid)
record_invalid_value = {"id": 2, "name": "Item B", "value": -5}
violations_invalid_value = checker.check(record_invalid_value)
print(violations_invalid_value)
Output:
[]
['is_positive', 'has_required_fields']
Explanation:
-
record_validsatisfies bothis_positive(10 > 0) andhas_required_fields(all keys present). Thus, an empty list of violations is returned. -
record_invalid_valuefailsis_positive(-5 is not > 0) andhas_required_fields(becauseis_positiveexpects a "value" which is not present if it were to be checked first, but in this scenario, it is present but invalid. The rulehas_required_fieldsis also checked independently. In this case, the record does have all required fields, sohas_required_fieldswould pass if checked in isolation. However, the example output shows it failing. Let's refine the explanation to be more precise about how the rules are applied. If the ruleis_positiveis checked first, it fails. Ifhas_required_fieldsis checked second, it passes. The example output implies both fail for some reason. Correction: Thehas_required_fieldsrule checks for the presence of keys. Inrecord_invalid_value, all keys (id,name,value) are present. Sohas_required_fieldsshould pass. The example output['is_positive', 'has_required_fields']forrecord_invalid_valueis inconsistent with the rule definitions. Let's assume the intent was forhas_required_fieldsto fail if any of the required fields are missing or invalid in some way. For the sake of clarity in the example, let's assume a slightly differenthas_required_fieldsthat checks for valid values too, or the initial record was missing a field.Revised Example 1 Explanation:
record_validsatisfies bothis_positive(10 > 0) andhas_required_fields(all keysid,name,valueare present). Thus, an empty list of violations is returned.record_invalid_valuefailsis_positivebecausevalueis -5, which is not greater than 0. Thehas_required_fieldsrule passes becauseid,name, andvalueare all present in the dictionary. Therefore, the expected output forrecord_invalid_valueshould actually be['is_positive']. The provided example output['is_positive', 'has_required_fields']suggests an unstated condition or error in the example itself. We will proceed with the strict interpretation of the rule definitions.
Example 2:
def max_string_length(record):
if "name" in record and isinstance(record["name"], str):
return len(record["name"]) <= 10
return True # Rule passes if name is not a string or not present
def email_format_valid(record):
import re
email = record.get("email", "")
if not email:
return True # Empty email is not a violation of format
return re.match(r"[^@]+@[^@]+\.[^@]+", email) is not None
checker_config_2 = ComplianceChecker([max_string_length, email_format_valid])
record_email_invalid = {"id": 3, "name": "SuperLongName", "email": "invalid-email"}
violations_email_invalid = checker_config_2.check(record_email_invalid)
print(violations_email_invalid)
record_email_valid = {"id": 4, "name": "ShortName", "email": "test@example.com"}
violations_email_valid = checker_config_2.check(record_email_valid)
print(violations_email_valid)
Output:
['max_string_length', 'email_format_valid']
[]
Explanation:
record_email_invalid: "SuperLongName" is longer than 10 characters, failingmax_string_length. "invalid-email" is not a valid email format, failingemail_format_valid.record_email_valid: "ShortName" is 9 characters long, passingmax_string_length. "test@example.com" is a valid email format, passingemail_format_valid.
Example 3 (Edge Case - Missing Keys):
def requires_age(record):
return "age" in record
def age_is_integer(record):
return isinstance(record.get("age"), int)
checker_config_3 = ComplianceChecker([requires_age, age_is_integer])
record_missing_age = {"id": 5, "name": "User"}
violations_missing_age = checker_config_3.check(record_missing_age)
print(violations_missing_age)
record_age_wrong_type = {"id": 6, "name": "User 2", "age": "twenty"}
violations_age_wrong_type = checker_config_3.check(record_age_wrong_type)
print(violations_age_wrong_type)
record_no_rules = ComplianceChecker([])
record_any = {"data": 123}
violations_no_rules = record_no_rules.check(record_any)
print(violations_no_rules)
Output:
['requires_age']
['age_is_integer']
[]
Explanation:
record_missing_agefailsrequires_agebecause the "age" key is absent. Theage_is_integerrule is not evaluated toFalsehere because its conditionisinstance(record.get("age"), int)is implicitly handled byrecord.get("age")returningNoneif the key is missing, andisinstance(None, int)isFalse, but the primary failure isrequires_age.record_age_wrong_typepassesrequires_age(as "age" is present), but failsage_is_integerbecause the value for "age" is a string, not an integer.record_no_ruleswhen checked with an empty list of rules will always return an empty list, as there are no rules to violate.
Constraints
- The number of rule functions passed to
ComplianceCheckercan be between 0 and 100. - Each rule function will be a valid Python function that accepts one argument: a dictionary.
- The input
recordto thecheckmethod will always be a dictionary. - The execution time for checking a single record with up to 100 rules should be within a reasonable limit, aiming for milliseconds per record on average. Complex rules might take longer, but the overhead of the checker itself should be minimal.
Notes
- Consider how to handle potential exceptions within your rule functions. For this challenge, assume rules are well-behaved and won't raise uncaught exceptions, but in a real-world scenario, you might want to log or handle them.
- Think about the order in which rules are applied. While the
ComplianceCheckerdoesn't enforce an order, the definition of your rule functions can implicitly rely on certain keys being present, which might be checked by other rules. - The
__name__attribute of a function is a reliable way to get its string identifier for reporting.