Hone logo
Hone
Problems

Transforming Customer Order Data for Targeted Marketing

This challenge focuses on implementing complex data transformations using SQL to prepare customer order data for targeted marketing campaigns. You'll need to combine, aggregate, and manipulate data from multiple tables to create a derived table containing customer segments based on their purchasing behavior. This is a common task in data warehousing and business intelligence.

Problem Description

You are given a database with three tables: Customers, Orders, and Products. Your task is to write a SQL query that creates a new table called CustomerSegments containing aggregated customer data suitable for marketing analysis. The CustomerSegments table should include the following columns:

  • customer_id: The ID of the customer.
  • total_spent: The total amount spent by the customer across all orders.
  • order_count: The total number of orders placed by the customer.
  • most_recent_order_date: The date of the customer's most recent order.
  • average_order_value: The average value of the customer's orders.
  • product_category_preference: The product category the customer has purchased the most items from.

The query must join the three tables, aggregate the data per customer, and determine the product category preference based on the number of items purchased from each category. If a customer has purchased an equal number of items from multiple categories, return the category that appears first alphabetically.

Key Requirements:

  • Handle cases where a customer has no orders (total_spent and order_count should be 0, most_recent_order_date should be NULL, average_order_value should be NULL).
  • Correctly calculate the average order value.
  • Accurately determine the product category preference.
  • The query should be efficient and performant.

Expected Behavior:

The query should return a table named CustomerSegments with the specified columns and data types. The data in the table should accurately reflect the aggregated customer information based on the provided input data.

Edge Cases to Consider:

  • Customers with no orders.
  • Customers who have placed orders with zero value.
  • Customers who have purchased the same number of items from multiple categories.
  • Empty tables (Customers, Orders, or Products).

Examples

Example 1:

Customers:
customer_id | name
------------|-------
1           | Alice
2           | Bob

Orders:
order_id | customer_id | order_date | total_amount
---------|-------------|------------|--------------
1        | 1           | 2023-01-15 | 100.00
2        | 1           | 2023-02-20 | 150.00
3        | 2           | 2023-03-10 | 50.00

Products:
product_id | product_name | category
-----------|--------------|----------
1          | Laptop       | Electronics
2          | Mouse        | Electronics
3          | T-Shirt      | Clothing
4          | Jeans        | Clothing
Output:
customer_id | total_spent | order_count | most_recent_order_date | average_order_value | product_category_preference
------------|-------------|-------------|------------------------|---------------------|-----------------------------
1           | 250.00      | 2           | 2023-02-20             | 125.00              | Electronics
2           | 50.00       | 1           | 2023-03-10             | 50.00               | Clothing

Explanation: Alice spent a total of $250 across 2 orders, with her most recent order on 2023-02-20. Her average order value is $125. She purchased two electronics items and one clothing item, so her preference is Electronics. Bob spent $50 across 1 order, with his most recent order on 2023-03-10. His average order value is $50. He purchased two clothing items, so his preference is Clothing.

Example 2:

Customers:
customer_id | name
------------|-------
1           | Alice

Orders:
order_id | customer_id | order_date | total_amount
---------|-------------|------------|--------------
1        | 1           | 2023-01-15 | 100.00

Products:
product_id | product_name | category
-----------|--------------|----------
1          | Laptop       | Electronics
Output:
customer_id | total_spent | order_count | most_recent_order_date | average_order_value | product_category_preference
------------|-------------|-------------|------------------------|---------------------|-----------------------------
1           | 100.00      | 1           | 2023-01-15             | 100.00              | Electronics

Explanation: Alice spent $100 across 1 order, with her most recent order on 2023-01-15. Her average order value is $100. She purchased one electronics item, so her preference is Electronics.

Constraints

  • The database schema is fixed as described above.
  • The number of rows in each table can be up to 10,000.
  • order_date is of type DATE.
  • total_amount is of type DECIMAL.
  • The query should complete within 10 seconds.
  • The SQL dialect should be compatible with PostgreSQL.

Notes

  • Consider using window functions or common table expressions (CTEs) to simplify the query and improve readability.
  • Pay close attention to handling NULL values, especially when calculating averages.
  • The product category preference should be determined based on the number of items purchased, not the total value of purchases within a category.
  • The CustomerSegments table should be created and populated with the results of the query. You do not need to drop the table if it already exists; the CREATE TABLE IF NOT EXISTS syntax is acceptable.
  • Focus on clarity and efficiency in your SQL code. Well-formatted and commented code is preferred.
  • Think about how to handle ties in product category preference. The problem statement specifies to return the category that appears first alphabetically in case of a tie.
Loading editor...
plaintext