Hone logo
Hone
Problems

Data Anonymization with Python

Data privacy is increasingly important. This challenge asks you to implement a Python function that anonymizes a list of dictionaries, replacing sensitive data fields with placeholder values. This is useful for sharing datasets for analysis or testing while protecting personally identifiable information (PII).

Problem Description

You are tasked with creating a function called anonymize_data that takes a list of dictionaries as input. Each dictionary represents a data record, and some fields within these records are considered sensitive and need to be anonymized. The function should replace the values of specified sensitive fields with a predefined placeholder value.

What needs to be achieved:

  • The function should iterate through a list of dictionaries.
  • For each dictionary, it should identify and replace the values of specified sensitive fields.
  • The original dictionaries in the input list should be modified in place.

Key Requirements:

  • The function must accept two arguments:
    • data: A list of dictionaries.
    • sensitive_fields: A list of strings representing the names of the fields to be anonymized.
  • The function must accept a third argument:
    • placeholder: A string representing the value to replace sensitive fields with. Defaults to "REDACTED".
  • The function should handle cases where a sensitive field is not present in a dictionary. In such cases, the dictionary should be left unchanged for that field.
  • The function should not modify fields that are not in the sensitive_fields list.

Expected Behavior:

The function should modify the input list of dictionaries directly, replacing the values of the specified sensitive fields with the placeholder value. The function should return the modified list of dictionaries.

Edge Cases to Consider:

  • Empty input list (data is an empty list).
  • Empty sensitive_fields list.
  • Dictionaries with missing keys (some dictionaries may not contain all the sensitive fields).
  • sensitive_fields list containing keys that don't exist in any of the dictionaries.
  • placeholder being an empty string.

Examples

Example 1:

Input: data = [{'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25, 'city': 'Los Angeles'}]
sensitive_fields = ['name', 'city']
placeholder = "REDACTED"
Output: [{'name': 'REDACTED', 'age': 30, 'city': 'REDACTED'}, {'name': 'REDACTED', 'age': 25, 'city': 'REDACTED'}]
Explanation: The 'name' and 'city' fields in both dictionaries are replaced with "REDACTED".

Example 2:

Input: data = [{'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25}]
sensitive_fields = ['name', 'city']
placeholder = "ANONYMIZED"
Output: [{'name': 'ANONYMIZED', 'age': 30, 'city': 'ANONYMIZED'}, {'name': 'ANONYMIZED', 'age': 25}]
Explanation: The 'name' and 'city' fields are replaced with "ANONYMIZED". The second dictionary doesn't have a 'city' field, so it's left unchanged for that field.

Example 3:

Input: data = [{'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25, 'city': 'Los Angeles'}]
sensitive_fields = []
placeholder = "REDACTED"
Output: [{'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25, 'city': 'Los Angeles'}]
Explanation: Since `sensitive_fields` is empty, no fields are modified.

Constraints

  • The input data will be a list of dictionaries. Each dictionary will contain strings and integers.
  • The sensitive_fields list will contain strings.
  • The placeholder will be a string.
  • The length of the data list can be up to 1000.
  • The number of keys in each dictionary can be up to 10.
  • The function should complete within 1 second for the given constraints.

Notes

  • Consider using a loop to iterate through the list of dictionaries.
  • Use the in operator to check if a key exists in a dictionary before attempting to modify it.
  • The function should modify the dictionaries in place to avoid creating unnecessary copies.
  • Think about how to handle the default value for the placeholder argument.
  • This is a good opportunity to practice dictionary manipulation and list comprehension (though a simple loop is perfectly acceptable).
Loading editor...
python