Customer Order Segmentation
This challenge focuses on segmenting customers based on their purchasing behavior. You'll need to identify groups of customers who meet specific criteria related to the number of orders they have placed. This is a common task in business analytics, allowing for targeted marketing campaigns or service level adjustments.
Problem Description
Your task is to write a SQL query that retrieves a list of customer IDs and their corresponding total number of orders, but only for those customers who have placed more than a specified minimum number of orders.
Key Requirements:
- Group by Customer: You need to aggregate order data for each individual customer.
- Count Orders: For each customer, you must count the total number of orders they have placed.
- Filter Groups: You must filter these aggregated results to include only those customers whose order count exceeds a given threshold.
Expected Behavior:
The output should be a table with two columns: customer_id and order_count. Each row should represent a customer who meets the filtering criteria.
Edge Cases:
- No Customers Meet Criteria: If no customers have placed more orders than the specified threshold, the output table should be empty.
- Customers with Zero Orders: While not explicitly in the input schema, if a customer could exist without any orders, they should not appear in the result as their
order_countwould be 0, and thus less than or equal to any positive threshold.
Examples
Example 1:
Input:
orders table:
| order_id | customer_id | order_date |
|---|---|---|
| 101 | C1001 | 2023-01-15 |
| 102 | C1002 | 2023-01-16 |
| 103 | C1001 | 2023-02-20 |
| 104 | C1003 | 2023-03-10 |
| 105 | C1002 | 2023-03-11 |
| 106 | C1001 | 2023-04-05 |
| 107 | C1004 | 2023-04-10 |
minimum_orders_threshold = 1
Output:
| customer_id | order_count |
|---|---|
| C1001 | 3 |
| C1002 | 2 |
| C1003 | 1 |
| C1004 | 1 |
Explanation:
Customers are grouped by customer_id.
- C1001 has 3 orders.
- C1002 has 2 orders.
- C1003 has 1 order.
- C1004 has 1 order.
The
minimum_orders_thresholdis 1. Customers with anorder_countstrictly greater than 1 are filtered. Therefore, C1001 (3 orders) and C1002 (2 orders) are included. C1003 and C1004 have exactly 1 order, which is not greater than the threshold.
Example 2:
Input:
orders table:
| order_id | customer_id | order_date |
|---|---|---|
| 101 | C1001 | 2023-01-15 |
| 102 | C1002 | 2023-01-16 |
| 103 | C1001 | 2023-02-20 |
| 104 | C1003 | 2023-03-10 |
| 105 | C1002 | 2023-03-11 |
| 106 | C1001 | 2023-04-05 |
| 107 | C1004 | 2023-04-10 |
minimum_orders_threshold = 2
Output:
| customer_id | order_count |
|---|---|
| C1001 | 3 |
Explanation: Similar to Example 1, we group and count orders per customer.
- C1001 has 3 orders.
- C1002 has 2 orders.
- C1003 has 1 order.
- C1004 has 1 order.
The
minimum_orders_thresholdis 2. Only customers with anorder_countstrictly greater than 2 are included. Therefore, only C1001 (3 orders) is selected.
Example 3: (Edge Case: No customers meet criteria)
Input:
orders table:
| order_id | customer_id | order_date |
|---|---|---|
| 101 | C1001 | 2023-01-15 |
| 102 | C1002 | 2023-01-16 |
minimum_orders_threshold = 2
Output:
| customer_id | order_count |
|---|
Explanation:
- C1001 has 1 order.
- C1002 has 1 order.
The
minimum_orders_thresholdis 2. No customer has more than 2 orders. Thus, the result is an empty set.
Constraints
- The
orderstable will contain at least one row. customer_idwill be a string.order_idwill be an integer.- The
minimum_orders_thresholdwill be a non-negative integer. - The query should be efficient and execute within typical database time limits for moderate-sized datasets.
Notes
This challenge is designed to test your understanding of aggregate functions (like COUNT) and how to filter results after aggregation. Think about which SQL clause is used to filter groups based on aggregated values. Pseudocode for the orders table schema might look like this:
TABLE orders (
order_id INT,
customer_id VARCHAR,
order_date DATE
);