Finding Customers with Above-Average Order Values
This challenge requires you to implement a correlated subquery to identify customers who have placed orders with a total value higher than the average order value across all customers. Correlated subqueries are powerful for comparing values within related rows, making them ideal for this type of analytical task.
Problem Description
You are given two tables: Customers and Orders.
- The
Customerstable contains information about each customer, with a uniquecustomer_id. - The
Orderstable contains information about individual orders, including theorder_id, thecustomer_idwho placed the order, and theorder_value.
Your task is to write a SQL query that returns the customer_id and customer_name of all customers who have at least one order whose order_value is strictly greater than the overall average order_value of all orders in the Orders table.
Key Requirements:
- Use a correlated subquery to determine the average order value.
- The comparison should be for each customer's orders against the global average.
- The output should list each qualifying customer only once.
Expected Behavior: The query should scan through each customer and their orders. For each customer, it should calculate or compare their order values against a pre-calculated (or subqueried) overall average order value. If any of a customer's orders exceed this average, that customer should be included in the result.
Edge Cases:
- What if a customer has no orders? They should not be included in the result.
- What if there are no orders in the
Orderstable? The query should ideally handle this gracefully (e.g., return an empty set or a specific error depending on SQL dialect behavior, but for this challenge, assume a non-emptyOrderstable). - What if all
order_valueare the same? No customer should be returned as the condition is "strictly greater than".
Examples
Example 1:
Input Tables:
Customers
| customer_id | customer_name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Charlie |
Orders
| order_id | customer_id | order_value |
|---|---|---|
| 101 | 1 | 150.00 |
| 102 | 1 | 200.00 |
| 103 | 2 | 100.00 |
| 104 | 3 | 300.00 |
| 105 | 3 | 250.00 |
Overall Average Order Value = (150 + 200 + 100 + 300 + 250) / 5 = 1000 / 5 = 200.00
Output:
| customer_id | customer_name |
|---|---|
| 3 | Charlie |
Explanation:
- Alice has orders with values 150 and 200. Neither is strictly greater than 200.
- Bob has an order with value 100. This is not strictly greater than 200.
- Charlie has orders with values 300 and 250. Both are strictly greater than 200. Therefore, Charlie is included.
Example 2:
Input Tables:
Customers
| customer_id | customer_name |
|---|---|
| 10 | David |
| 11 | Eve |
| 12 | Frank |
Orders
| order_id | customer_id | order_value |
|---|---|---|
| 201 | 10 | 50.00 |
| 202 | 11 | 50.00 |
| 203 | 12 | 50.00 |
Overall Average Order Value = (50 + 50 + 50) / 3 = 150 / 3 = 50.00
Output: (Empty Set)
Explanation: The average order value is 50.00. No customer has an order value strictly greater than 50.00.
Example 3: (Complex Scenario)
Input Tables:
Customers
| customer_id | customer_name |
|---|---|
| 5 | Grace |
| 6 | Heidi |
| 7 | Ivan |
Orders
| order_id | customer_id | order_value |
|---|---|---|
| 301 | 5 | 100.00 |
| 302 | 5 | 400.00 |
| 303 | 6 | 150.00 |
| 304 | 7 | 180.00 |
| 305 | 7 | 220.00 |
| 306 | 7 | 100.00 |
Overall Average Order Value = (100 + 400 + 150 + 180 + 220 + 100) / 6 = 1150 / 6 = 191.66...
Output:
| customer_id | customer_name |
|---|---|
| 5 | Grace |
| 7 | Ivan |
Explanation:
- Grace has orders 100 and 400. Order 400 is > 191.66...
- Heidi has order 150. This is not > 191.66...
- Ivan has orders 180, 220, and 100. Order 220 is > 191.66...
Constraints
customer_idinCustomerstable is unique.order_idinOrderstable is unique.customer_idinOrderstable is a foreign key referencingCustomers.customer_id.order_valuewill be a non-negative decimal number.- The number of customers is between 1 and 1,000.
- The number of orders is between 1 and 10,000.
- The query should ideally execute efficiently, avoiding excessive redundant calculations.
Notes
- A correlated subquery is a subquery (a query nested inside another query) that depends on the outer query for its values. It is executed once for each row processed by the outer query.
- Consider how to calculate the overall average order value. This is a crucial part of the correlated subquery's logic.
- You will likely need to join the
CustomersandOrderstables, but the core logic for filtering customers will involve a correlated subquery. - The pseudocode for the query might look conceptually like this:
(Note: The pseudocode above usesSELECT C.customer_id, C.customer_name FROM Customers C WHERE EXISTS ( SELECT 1 FROM Orders O WHERE O.customer_id = C.customer_id -- Correlation AND O.order_value > (SELECT AVG(order_value) FROM Orders) -- Subquery to get global average );EXISTS, but other approaches likeINor directly comparing with a subquery in theWHEREclause are also possible. The key is the correlated comparison.) - Think about what the inner query needs to return to satisfy the outer query's condition.