Hone logo
Hone
Problems

Finding Customers with Above-Average Order Values

This challenge requires you to implement a correlated subquery to identify customers who have placed orders with a total value higher than the average order value across all customers. Correlated subqueries are powerful for comparing values within related rows, making them ideal for this type of analytical task.

Problem Description

You are given two tables: Customers and Orders.

  • The Customers table contains information about each customer, with a unique customer_id.
  • The Orders table contains information about individual orders, including the order_id, the customer_id who placed the order, and the order_value.

Your task is to write a SQL query that returns the customer_id and customer_name of all customers who have at least one order whose order_value is strictly greater than the overall average order_value of all orders in the Orders table.

Key Requirements:

  • Use a correlated subquery to determine the average order value.
  • The comparison should be for each customer's orders against the global average.
  • The output should list each qualifying customer only once.

Expected Behavior: The query should scan through each customer and their orders. For each customer, it should calculate or compare their order values against a pre-calculated (or subqueried) overall average order value. If any of a customer's orders exceed this average, that customer should be included in the result.

Edge Cases:

  • What if a customer has no orders? They should not be included in the result.
  • What if there are no orders in the Orders table? The query should ideally handle this gracefully (e.g., return an empty set or a specific error depending on SQL dialect behavior, but for this challenge, assume a non-empty Orders table).
  • What if all order_value are the same? No customer should be returned as the condition is "strictly greater than".

Examples

Example 1:

Input Tables:

Customers

customer_idcustomer_name
1Alice
2Bob
3Charlie

Orders

order_idcustomer_idorder_value
1011150.00
1021200.00
1032100.00
1043300.00
1053250.00

Overall Average Order Value = (150 + 200 + 100 + 300 + 250) / 5 = 1000 / 5 = 200.00

Output:

customer_idcustomer_name
3Charlie

Explanation:

  • Alice has orders with values 150 and 200. Neither is strictly greater than 200.
  • Bob has an order with value 100. This is not strictly greater than 200.
  • Charlie has orders with values 300 and 250. Both are strictly greater than 200. Therefore, Charlie is included.

Example 2:

Input Tables:

Customers

customer_idcustomer_name
10David
11Eve
12Frank

Orders

order_idcustomer_idorder_value
2011050.00
2021150.00
2031250.00

Overall Average Order Value = (50 + 50 + 50) / 3 = 150 / 3 = 50.00

Output: (Empty Set)

Explanation: The average order value is 50.00. No customer has an order value strictly greater than 50.00.

Example 3: (Complex Scenario)

Input Tables:

Customers

customer_idcustomer_name
5Grace
6Heidi
7Ivan

Orders

order_idcustomer_idorder_value
3015100.00
3025400.00
3036150.00
3047180.00
3057220.00
3067100.00

Overall Average Order Value = (100 + 400 + 150 + 180 + 220 + 100) / 6 = 1150 / 6 = 191.66...

Output:

customer_idcustomer_name
5Grace
7Ivan

Explanation:

  • Grace has orders 100 and 400. Order 400 is > 191.66...
  • Heidi has order 150. This is not > 191.66...
  • Ivan has orders 180, 220, and 100. Order 220 is > 191.66...

Constraints

  • customer_id in Customers table is unique.
  • order_id in Orders table is unique.
  • customer_id in Orders table is a foreign key referencing Customers.customer_id.
  • order_value will be a non-negative decimal number.
  • The number of customers is between 1 and 1,000.
  • The number of orders is between 1 and 10,000.
  • The query should ideally execute efficiently, avoiding excessive redundant calculations.

Notes

  • A correlated subquery is a subquery (a query nested inside another query) that depends on the outer query for its values. It is executed once for each row processed by the outer query.
  • Consider how to calculate the overall average order value. This is a crucial part of the correlated subquery's logic.
  • You will likely need to join the Customers and Orders tables, but the core logic for filtering customers will involve a correlated subquery.
  • The pseudocode for the query might look conceptually like this:
    SELECT C.customer_id, C.customer_name
    FROM Customers C
    WHERE EXISTS (
        SELECT 1
        FROM Orders O
        WHERE O.customer_id = C.customer_id  -- Correlation
        AND O.order_value > (SELECT AVG(order_value) FROM Orders) -- Subquery to get global average
    );
    
    (Note: The pseudocode above uses EXISTS, but other approaches like IN or directly comparing with a subquery in the WHERE clause are also possible. The key is the correlated comparison.)
  • Think about what the inner query needs to return to satisfy the outer query's condition.
Loading editor...
plaintext