Hone logo
Hone
Problems

Unique Orders and Customers Per Month

This challenge focuses on analyzing order data to understand monthly sales trends. You'll need to determine the number of unique orders and unique customers for each month. This information is crucial for businesses to track customer engagement and order volume over time, aiding in strategic planning and marketing efforts.

Problem Description

You are given a dataset containing customer orders, each with a customer ID, an order ID, and the date the order was placed. Your task is to process this data and calculate, for each distinct month present in the dataset:

  1. The total number of unique orders.
  2. The total number of unique customers who placed orders.

Key Requirements:

  • The output should be grouped by month.
  • A "month" should be defined by the year and the month number (e.g., 2023-01 for January 2023).
  • For each month, count only distinct order_id values for unique orders.
  • For each month, count only distinct customer_id values for unique customers.

Expected Behavior:

The output should be a collection (e.g., a list of objects, a dictionary, a table) where each entry represents a month and contains the calculated counts for unique orders and unique customers for that month.

Edge Cases:

  • Empty dataset: If no orders are provided, the output should be an empty collection.
  • Orders spanning multiple years: The grouping must correctly distinguish between months in different years.
  • Months with no orders: If a month in the overall date range has no orders, it should not appear in the output.

Examples

Example 1:

Input:

[
  {"customer_id": "C1", "order_id": "O1", "order_date": "2023-01-10"},
  {"customer_id": "C2", "order_id": "O2", "order_date": "2023-01-15"},
  {"customer_id": "C1", "order_id": "O3", "order_date": "2023-01-20"},
  {"customer_id": "C3", "order_id": "O4", "order_date": "2023-02-05"},
  {"customer_id": "C2", "order_id": "O5", "order_date": "2023-02-12"}
]

Output:

[
  {"month": "2023-01", "unique_orders": 3, "unique_customers": 2},
  {"month": "2023-02", "unique_orders": 2, "unique_customers": 2}
]

Explanation:

  • 2023-01: Orders O1, O2, O3 are unique. Customers C1, C2 are unique.
  • 2023-02: Orders O4, O5 are unique. Customers C3, C2 are unique.

Example 2:

Input:

[
  {"customer_id": "A", "order_id": "X1", "order_date": "2022-12-01"},
  {"customer_id": "B", "order_id": "X2", "order_date": "2022-12-15"},
  {"customer_id": "A", "order_id": "X3", "order_date": "2022-12-20"},
  {"customer_id": "A", "order_id": "X1", "order_date": "2022-12-25"} // Duplicate order_id in the same month for same customer
]

Output:

[
  {"month": "2022-12", "unique_orders": 2, "unique_customers": 2}
]

Explanation:

  • 2022-12: The unique orders are X1 and X2 (even though X1 appears twice, it's still one unique order). The unique customers are A and B.

Example 3: (Edge case - multiple years)

Input:

[
  {"customer_id": "Cust1", "order_id": "OrdA", "order_date": "2023-11-01"},
  {"customer_id": "Cust2", "order_id": "OrdB", "order_date": "2023-11-10"},
  {"customer_id": "Cust1", "order_id": "OrdC", "order_date": "2024-01-05"},
  {"customer_id": "Cust3", "order_id": "OrdD", "order_date": "2024-01-15"},
  {"customer_id": "Cust2", "order_id": "OrdE", "order_date": "2024-01-20"}
]

Output:

[
  {"month": "2023-11", "unique_orders": 2, "unique_customers": 2},
  {"month": "2024-01", "unique_orders": 3, "unique_customers": 3}
]

Explanation:

  • 2023-11: Unique orders: OrdA, OrdB. Unique customers: Cust1, Cust2.
  • 2024-01: Unique orders: OrdC, OrdD, OrdE. Unique customers: Cust1, Cust3, Cust2.

Constraints

  • The input is a list of dictionaries (or equivalent data structure). Each dictionary represents an order and will contain at least customer_id, order_id, and order_date as strings.
  • customer_id and order_id will be non-empty strings.
  • order_date will be in the format "YYYY-MM-DD".
  • The total number of orders will be between 0 and 100,000.
  • The number of unique customer_ids and order_ids will not exceed 50,000 each.
  • The solution should aim for an efficient time complexity, ideally close to O(N) where N is the number of orders, to handle large datasets.

Notes

  • You will need to parse the order_date to extract the year and month.
  • Consider using data structures like sets or hash maps (dictionaries) to efficiently track unique orders and customers per month.
  • The order of the output entries (months) does not strictly matter, but chronological order is generally preferred.
Loading editor...
plaintext