Replace Employee ID With The Unique Identifier
Many systems track employees using unique identifiers. Sometimes, these identifiers are not guaranteed to be consistent or may need to be mapped to a standardized, globally unique format. This challenge requires you to build a system that takes a list of employee records, each with a potentially inconsistent employee ID, and replaces these IDs with a consistently generated, unique identifier. This is crucial for data consolidation, preventing duplicate entries, and ensuring reliable lookups across different systems.
Problem Description
Your task is to process a list of employee records. Each record contains an employee ID and other employee details. You need to identify duplicate employee entries based on their employee ID and assign a new, unique identifier to each distinct employee. The original employee IDs might be inconsistent (e.g., different casing, leading/trailing spaces, or even slightly different formats for the same employee).
What Needs to be Achieved:
- Process a collection of employee records.
- Normalize employee IDs to ensure consistent comparison.
- Identify distinct employees based on their normalized employee IDs.
- Replace the original employee ID in each record with a newly generated, unique identifier.
- Ensure that all records belonging to the same original employee receive the same new unique identifier.
Key Requirements:
- Normalization: Before comparison, employee IDs must be normalized. A reasonable normalization would involve converting to lowercase and trimming whitespace.
- Uniqueness: The generated unique identifiers must be distinct for each unique employee.
- Consistency: All records associated with the same original employee (after normalization) must be assigned the same new unique identifier.
- Output Format: The output should be a list of the same employee records, but with the original
employee_idfield replaced by a new field,unique_id.
Expected Behavior:
The function should accept a list of employee objects (or dictionaries). It should return a new list of employee objects where each object has its original employee_id replaced by a unique_id.
Edge Cases:
- Empty input list of employee records.
- Employee records with missing or null
employee_idfields. - Employee IDs that are identical after normalization but might have different original casing or spacing.
- Very large datasets, implying a need for an efficient solution.
Examples
Example 1:
Input: [
{"employee_id": "EMP001", "name": "Alice", "department": "Sales"},
{"employee_id": " emp001 ", "name": "Alice", "department": "Marketing"},
{"employee_id": "EMP002", "name": "Bob", "department": "Engineering"}
]
Output: [
{"unique_id": "UID-1", "name": "Alice", "department": "Sales"},
{"unique_id": "UID-1", "name": "Alice", "department": "Marketing"},
{"unique_id": "UID-2", "name": "Bob", "department": "Engineering"}
]
Explanation:
"EMP001" and " emp001 " normalize to "emp001". Both records belong to the same employee and are assigned "UID-1".
"EMP002" normalizes to "emp002" and is assigned a new unique ID, "UID-2".
Example 2:
Input: [
{"employee_id": "123", "name": "Charlie"},
{"employee_id": "456", "name": "David"},
{"employee_id": "123", "name": "Charlie"}
]
Output: [
{"unique_id": "UID-1", "name": "Charlie"},
{"unique_id": "UID-2", "name": "David"},
{"unique_id": "UID-1", "name": "Charlie"}
]
Explanation:
The first and third records have the same employee ID "123", so they get the same unique ID "UID-1".
The second record with ID "456" gets a different unique ID, "UID-2".
Example 3:
Input: []
Output: []
Explanation:
An empty input list results in an empty output list.
Constraints
- The number of employee records can range from 0 to 1,000,000.
- Employee IDs are strings. They can contain alphanumeric characters, spaces, and other symbols.
- The generated
unique_idshould follow the pattern "UID-N", where N is a sequential integer starting from 1. - The solution should aim for an efficient time complexity, preferably O(N) or O(N log N), where N is the number of employee records.
Notes
- Consider how you will generate the sequential unique identifiers.
- A hash map or dictionary will be very useful for keeping track of seen employee IDs and their assigned unique identifiers.
- Remember to preserve all original fields of the employee records, only replacing the
employee_idwithunique_id. - The order of records in the output list does not need to match the input order, but all original records must be present.