Hone logo
Hone
Problems

Customer Segmentation and Purchase Behavior Analysis

This challenge focuses on leveraging advanced SQL features to analyze customer purchase data and segment customers based on their spending habits. Understanding customer segments allows businesses to tailor marketing campaigns, personalize product recommendations, and improve overall customer satisfaction. You will be provided with a dataset of customer transactions and asked to perform several analytical queries using window functions, common table expressions (CTEs), and potentially other advanced SQL techniques.

Problem Description

You are given a table named transactions containing customer transaction data. The table has the following columns:

  • customer_id (INTEGER): Unique identifier for each customer.
  • transaction_date (DATE): Date of the transaction.
  • product_category (VARCHAR): Category of the product purchased.
  • purchase_amount (DECIMAL): Amount spent on the transaction.

Your task is to write SQL queries to answer the following questions:

  1. Top Spending Categories: Determine the top 3 product categories with the highest total purchase amount across all transactions.
  2. Customer Lifetime Value (CLTV) - Simple Version: Calculate a simplified CLTV for each customer. CLTV is defined as the total purchase amount for each customer.
  3. Monthly Spending Trend: For each customer, calculate their total spending for each month. Display the customer_id, transaction_year, transaction_month, and total_monthly_spending.
  4. High-Value Customers: Identify customers who have spent more than the average spending of all customers in the last 6 months.
  5. Category Spending Comparison: For each customer, determine the difference in spending between their highest and lowest spending product categories. If a customer has only one category, the difference should be 0.

Examples

Example 1:

Input: transactions table with various product categories and purchase amounts.
Output:
product_category | total_purchase_amount
------------------+-----------------------
Electronics       | 120000.00
Clothing          | 90000.00
HomeGoods         | 80000.00
Explanation: The query calculates the total purchase amount for each product category and returns the top 3.

Example 2:

Input: transactions table with customer_id, transaction_date, and purchase_amount.
Output:
customer_id | cltv
------------+-------
101         | 5000.00
102         | 7500.00
103         | 3000.00
Explanation: The query calculates the total purchase amount (CLTV) for each customer.

Example 3:

Input: transactions table with customer_id, transaction_date, and purchase_amount.
Output:
customer_id | transaction_year | transaction_month | total_monthly_spending
------------+------------------+-------------------+------------------------
101         | 2023             | 1                 | 1500.00
101         | 2023             | 2                 | 2000.00
102         | 2023             | 1                 | 2500.00
...
Explanation: The query calculates the total spending for each customer for each month.

Constraints

  • The transactions table will contain at least 1000 rows.
  • transaction_date will be a valid date format.
  • purchase_amount will be a non-negative decimal number.
  • Queries should be efficient and avoid full table scans where possible. Performance should be reasonable for a table of 1000+ rows.
  • Assume the database system supports standard SQL features like window functions, CTEs, and date functions.

Notes

  • Consider using CTEs to break down complex queries into smaller, more manageable parts.

  • Window functions are particularly useful for calculating running totals, rankings, and other aggregate values within partitions of the data.

  • Date functions will be necessary for extracting year and month from the transaction_date.

  • For the "High-Value Customers" query, you'll need to calculate the average spending across all customers first, then filter for customers who exceed that average in the last 6 months.

  • For the "Category Spending Comparison" query, you'll need to find the maximum and minimum spending for each customer within each category. Consider using window functions or subqueries to achieve this.

  • The specific SQL dialect (e.g., PostgreSQL, MySQL, SQL Server) is not specified, so aim for standard SQL that is widely compatible.

  • Pseudocode for the High-Value Customers query:

    1. Calculate the average spending of all customers in the last 6 months.
    2. For each customer:
        a. Calculate their total spending in the last 6 months.
        b. If their total spending is greater than the average spending calculated in step 1, consider them a high-value customer.
    3. Return the list of high-value customer IDs.
    
Loading editor...
plaintext