Hone logo
Hone
Problems

Data Compliance Checker

Organizations often need to ensure their data adheres to specific rules and regulations (e.g., GDPR, HIPAA). This challenge asks you to implement a Python-based compliance checker that validates data against a set of predefined rules. The goal is to create a reusable and extensible system for identifying and reporting compliance issues within a dataset.

Problem Description

You are tasked with building a ComplianceChecker class in Python. This class will take a dataset (represented as a list of dictionaries) and a set of compliance rules as input. Each compliance rule will be a dictionary containing:

  • 'field': The name of the field in the dataset to check.
  • 'rule': A function that takes the value of the field as input and returns True if the value complies with the rule, and False otherwise.
  • 'error_message': A string describing the compliance error if the rule is violated.

The ComplianceChecker class should have a check_dataset method that iterates through the dataset and applies each rule to each record. The method should return a list of dictionaries, where each dictionary represents a compliance violation and contains:

  • 'record_index': The index of the record in the dataset where the violation occurred.
  • 'field': The name of the field that violated the rule.
  • 'value': The value of the field that violated the rule.
  • 'error_message': The error message associated with the rule.

Examples

Example 1:

Input:
dataset = [
    {'name': 'Alice', 'age': 30, 'email': 'alice@example.com'},
    {'name': 'Bob', 'age': 15, 'email': 'bob@example.com'},
    {'name': 'Charlie', 'age': 45, 'email': 'charlie@example.com'}
]
rules = [
    {'field': 'age', 'rule': lambda age: age >= 18, 'error_message': 'Age must be 18 or older.'},
    {'field': 'email', 'rule': lambda email: '@' in email, 'error_message': 'Email must contain an "@" symbol.'}
]

Output:
[
    {'record_index': 1, 'field': 'age', 'value': 15, 'error_message': 'Age must be 18 or older.'},
    {'record_index': 1, 'field': 'email', 'value': 'bob@example.com', 'error_message': 'Email must contain an "@" symbol.'}
]

Explanation: The first record (Bob) fails both the age and email rules. The second record (Alice) passes both rules. The third record (Charlie) passes both rules.

Example 2:

Input:
dataset = [
    {'product_id': '123', 'price': 10.0},
    {'product_id': '456', 'price': -5.0},
    {'product_id': '789', 'price': 20.5}
]
rules = [
    {'field': 'price', 'rule': lambda price: price > 0, 'error_message': 'Price must be positive.'}
]

Output:
[
    {'record_index': 1, 'field': 'price', 'value': -5.0, 'error_message': 'Price must be positive.'}
]

Explanation: Only the second record (product_id 456) has a negative price, triggering the rule violation.

Example 3: (Empty Dataset)

Input:
dataset = []
rules = [
    {'field': 'age', 'rule': lambda age: age >= 18, 'error_message': 'Age must be 18 or older.'}
]

Output:
[]

Explanation: An empty dataset will result in no compliance violations.

Constraints

  • The dataset will be a list of dictionaries. Each dictionary represents a record.
  • The rules will be a list of dictionaries, as described above.
  • The rule function should accept a single argument (the value of the field) and return a boolean.
  • The error_message should be a non-empty string.
  • The dataset can contain up to 1000 records.
  • Each record can have up to 20 fields.
  • The check_dataset method should return a list of dictionaries, even if no violations are found (in which case the list will be empty).

Notes

  • Consider using list comprehensions or generator expressions for concise code.
  • The rule function can be any valid Python function that accepts a single argument and returns a boolean.
  • Error messages should be clear and informative, helping users understand why a record failed compliance.
  • Think about how to make the ComplianceChecker class extensible to support new rules easily. You might consider using a more generic approach to defining rules if you want to extend this further.
  • Assume that the field specified in the rule exists in every record of the dataset. No need to handle KeyError exceptions.
Loading editor...
python