Unique Orders and Customers Per Month
This challenge focuses on analyzing order data to understand monthly sales trends. You'll need to determine the number of unique orders and unique customers for each month. This information is crucial for businesses to track customer engagement and order volume over time, aiding in strategic planning and marketing efforts.
Problem Description
You are given a dataset containing customer orders, each with a customer ID, an order ID, and the date the order was placed. Your task is to process this data and calculate, for each distinct month present in the dataset:
- The total number of unique orders.
- The total number of unique customers who placed orders.
Key Requirements:
- The output should be grouped by month.
- A "month" should be defined by the year and the month number (e.g., 2023-01 for January 2023).
- For each month, count only distinct
order_idvalues for unique orders. - For each month, count only distinct
customer_idvalues for unique customers.
Expected Behavior:
The output should be a collection (e.g., a list of objects, a dictionary, a table) where each entry represents a month and contains the calculated counts for unique orders and unique customers for that month.
Edge Cases:
- Empty dataset: If no orders are provided, the output should be an empty collection.
- Orders spanning multiple years: The grouping must correctly distinguish between months in different years.
- Months with no orders: If a month in the overall date range has no orders, it should not appear in the output.
Examples
Example 1:
Input:
[
{"customer_id": "C1", "order_id": "O1", "order_date": "2023-01-10"},
{"customer_id": "C2", "order_id": "O2", "order_date": "2023-01-15"},
{"customer_id": "C1", "order_id": "O3", "order_date": "2023-01-20"},
{"customer_id": "C3", "order_id": "O4", "order_date": "2023-02-05"},
{"customer_id": "C2", "order_id": "O5", "order_date": "2023-02-12"}
]
Output:
[
{"month": "2023-01", "unique_orders": 3, "unique_customers": 2},
{"month": "2023-02", "unique_orders": 2, "unique_customers": 2}
]
Explanation:
- 2023-01: Orders O1, O2, O3 are unique. Customers C1, C2 are unique.
- 2023-02: Orders O4, O5 are unique. Customers C3, C2 are unique.
Example 2:
Input:
[
{"customer_id": "A", "order_id": "X1", "order_date": "2022-12-01"},
{"customer_id": "B", "order_id": "X2", "order_date": "2022-12-15"},
{"customer_id": "A", "order_id": "X3", "order_date": "2022-12-20"},
{"customer_id": "A", "order_id": "X1", "order_date": "2022-12-25"} // Duplicate order_id in the same month for same customer
]
Output:
[
{"month": "2022-12", "unique_orders": 2, "unique_customers": 2}
]
Explanation:
- 2022-12: The unique orders are X1 and X2 (even though X1 appears twice, it's still one unique order). The unique customers are A and B.
Example 3: (Edge case - multiple years)
Input:
[
{"customer_id": "Cust1", "order_id": "OrdA", "order_date": "2023-11-01"},
{"customer_id": "Cust2", "order_id": "OrdB", "order_date": "2023-11-10"},
{"customer_id": "Cust1", "order_id": "OrdC", "order_date": "2024-01-05"},
{"customer_id": "Cust3", "order_id": "OrdD", "order_date": "2024-01-15"},
{"customer_id": "Cust2", "order_id": "OrdE", "order_date": "2024-01-20"}
]
Output:
[
{"month": "2023-11", "unique_orders": 2, "unique_customers": 2},
{"month": "2024-01", "unique_orders": 3, "unique_customers": 3}
]
Explanation:
- 2023-11: Unique orders: OrdA, OrdB. Unique customers: Cust1, Cust2.
- 2024-01: Unique orders: OrdC, OrdD, OrdE. Unique customers: Cust1, Cust3, Cust2.
Constraints
- The input is a list of dictionaries (or equivalent data structure). Each dictionary represents an order and will contain at least
customer_id,order_id, andorder_dateas strings. customer_idandorder_idwill be non-empty strings.order_datewill be in the format "YYYY-MM-DD".- The total number of orders will be between 0 and 100,000.
- The number of unique
customer_ids andorder_ids will not exceed 50,000 each. - The solution should aim for an efficient time complexity, ideally close to O(N) where N is the number of orders, to handle large datasets.
Notes
- You will need to parse the
order_dateto extract the year and month. - Consider using data structures like sets or hash maps (dictionaries) to efficiently track unique orders and customers per month.
- The order of the output entries (months) does not strictly matter, but chronological order is generally preferred.