Identify Employees with Incomplete Records
This challenge focuses on data cleaning and manipulation. You will be tasked with identifying employees who have incomplete information in a given dataset. This is a common scenario in data management where ensuring data integrity is crucial for accurate reporting and analysis.
Problem Description
You are provided with a dataset of employee records. Each record contains information about an employee, including their ID, name, and potentially other details. Your goal is to identify and return the IDs of all employees for whom at least one piece of required information is missing.
Key Requirements:
- You will be given a collection of employee records.
- Each employee record will have a unique identifier (e.g.,
employee_id). - Certain fields within an employee record are considered mandatory (e.g.,
email,department). - A record is considered incomplete if any of these mandatory fields are empty, null, or not present.
- The output should be a list of
employee_ids for all employees with incomplete records.
Expected Behavior:
- The function should iterate through all employee records.
- For each record, it should check if any of the designated mandatory fields are missing.
- If a record is found to be incomplete, its
employee_idshould be added to the result list. - The order of
employee_ids in the output list does not matter.
Edge Cases:
- An empty input dataset.
- A dataset where all employees have complete records.
- A dataset where all employees have incomplete records.
- Records with missing non-mandatory fields should not be flagged as incomplete.
Examples
Example 1:
Input:
employee_records = [
{ "employee_id": 101, "name": "Alice", "email": "alice@example.com", "department": "Engineering" },
{ "employee_id": 102, "name": "Bob", "email": "", "department": "Marketing" },
{ "employee_id": 103, "name": "Charlie", "email": "charlie@example.com", "department": "Sales" }
]
mandatory_fields = ["email", "department"]
Output: [102]
Explanation: Employee with employee_id 102 has a missing email address.
Example 2:
Input:
employee_records = [
{ "employee_id": 201, "name": "David", "email": "david@example.com", "department": "HR", "phone": "123-456-7890" },
{ "employee_id": 202, "name": "Eve", "email": "eve@example.com", "department": "Finance" },
{ "employee_id": 203, "name": "Frank", "email": null, "department": "IT" }
]
mandatory_fields = ["email"]
Output: [203]
Explanation: Employee with employee_id 203 has a missing email address (represented as null). The phone number for employee 201 is not a mandatory field and its presence/absence doesn't affect incompleteness.
Example 3:
Input:
employee_records = [
{ "employee_id": 301, "name": "Grace", "department": "Support" },
{ "employee_id": 302, "name": "Heidi", "email": "heidi@example.com" }
]
mandatory_fields = ["email", "department"]
Output: [301, 302]
Explanation: Employee 301 is missing an email, and employee 302 is missing a department.
Constraints
- The number of employee records will be between 0 and 10,000.
employee_idwill be a unique integer.- Mandatory fields will be specified as a list of strings.
- The input
employee_recordswill be a list of dictionaries (or equivalent key-value structures). - The
namefield will always be present. - Missing values can be represented by empty strings (
""),null(or equivalent), or by the key not being present in the dictionary. - The solution should aim for an efficient time complexity, ideally linear with respect to the number of employee records and the number of mandatory fields.
Notes
- Consider how to handle different representations of "missing" information (e.g., empty string, null value, or key absence).
- The
mandatory_fieldslist will always contain valid field names that could be present in an employee record. - This challenge is designed to test your ability to iterate through collections, perform conditional checks, and manage data structures.