Hone logo
Hone
Problems

Replace Employee ID With The Unique Identifier

Many systems track employees using unique identifiers. Sometimes, these identifiers are not guaranteed to be consistent or may need to be mapped to a standardized, globally unique format. This challenge requires you to build a system that takes a list of employee records, each with a potentially inconsistent employee ID, and replaces these IDs with a consistently generated, unique identifier. This is crucial for data consolidation, preventing duplicate entries, and ensuring reliable lookups across different systems.

Problem Description

Your task is to process a list of employee records. Each record contains an employee ID and other employee details. You need to identify duplicate employee entries based on their employee ID and assign a new, unique identifier to each distinct employee. The original employee IDs might be inconsistent (e.g., different casing, leading/trailing spaces, or even slightly different formats for the same employee).

What Needs to be Achieved:

  • Process a collection of employee records.
  • Normalize employee IDs to ensure consistent comparison.
  • Identify distinct employees based on their normalized employee IDs.
  • Replace the original employee ID in each record with a newly generated, unique identifier.
  • Ensure that all records belonging to the same original employee receive the same new unique identifier.

Key Requirements:

  • Normalization: Before comparison, employee IDs must be normalized. A reasonable normalization would involve converting to lowercase and trimming whitespace.
  • Uniqueness: The generated unique identifiers must be distinct for each unique employee.
  • Consistency: All records associated with the same original employee (after normalization) must be assigned the same new unique identifier.
  • Output Format: The output should be a list of the same employee records, but with the original employee_id field replaced by a new field, unique_id.

Expected Behavior:

The function should accept a list of employee objects (or dictionaries). It should return a new list of employee objects where each object has its original employee_id replaced by a unique_id.

Edge Cases:

  • Empty input list of employee records.
  • Employee records with missing or null employee_id fields.
  • Employee IDs that are identical after normalization but might have different original casing or spacing.
  • Very large datasets, implying a need for an efficient solution.

Examples

Example 1:

Input: [
  {"employee_id": "EMP001", "name": "Alice", "department": "Sales"},
  {"employee_id": " emp001 ", "name": "Alice", "department": "Marketing"},
  {"employee_id": "EMP002", "name": "Bob", "department": "Engineering"}
]

Output: [
  {"unique_id": "UID-1", "name": "Alice", "department": "Sales"},
  {"unique_id": "UID-1", "name": "Alice", "department": "Marketing"},
  {"unique_id": "UID-2", "name": "Bob", "department": "Engineering"}
]

Explanation:
"EMP001" and " emp001 " normalize to "emp001". Both records belong to the same employee and are assigned "UID-1".
"EMP002" normalizes to "emp002" and is assigned a new unique ID, "UID-2".

Example 2:

Input: [
  {"employee_id": "123", "name": "Charlie"},
  {"employee_id": "456", "name": "David"},
  {"employee_id": "123", "name": "Charlie"}
]

Output: [
  {"unique_id": "UID-1", "name": "Charlie"},
  {"unique_id": "UID-2", "name": "David"},
  {"unique_id": "UID-1", "name": "Charlie"}
]

Explanation:
The first and third records have the same employee ID "123", so they get the same unique ID "UID-1".
The second record with ID "456" gets a different unique ID, "UID-2".

Example 3:

Input: []

Output: []

Explanation:
An empty input list results in an empty output list.

Constraints

  • The number of employee records can range from 0 to 1,000,000.
  • Employee IDs are strings. They can contain alphanumeric characters, spaces, and other symbols.
  • The generated unique_id should follow the pattern "UID-N", where N is a sequential integer starting from 1.
  • The solution should aim for an efficient time complexity, preferably O(N) or O(N log N), where N is the number of employee records.

Notes

  • Consider how you will generate the sequential unique identifiers.
  • A hash map or dictionary will be very useful for keeping track of seen employee IDs and their assigned unique identifiers.
  • Remember to preserve all original fields of the employee records, only replacing the employee_id with unique_id.
  • The order of records in the output list does not need to match the input order, but all original records must be present.
Loading editor...
plaintext