Hone logo
Hone
Problems

Analyzing Customer Purchase Patterns with Advanced Aggregations

This challenge focuses on using advanced SQL aggregation techniques to understand customer purchasing behavior. By analyzing transaction data, you will identify valuable customer segments and trends, which can inform marketing strategies and inventory management.

Problem Description

You are tasked with analyzing a dataset of customer transactions to identify customers who exhibit specific purchasing behaviors. Specifically, you need to find customers who have made at least three distinct purchases within a single calendar month, and whose total spending in that month exceeded a certain threshold. This requires combining information across multiple rows (transactions) for each customer within specific timeframes.

Key Requirements:

  1. Identify purchases within a month: Group transactions by customer and by calendar month.
  2. Count distinct purchases per month: For each customer-month combination, determine the number of unique transactions.
  3. Calculate total spending per month: For each customer-month combination, sum the total amount spent.
  4. Filter based on criteria: Select customers who meet the following conditions for at least one month:
    • Made at least 3 distinct purchases in that month.
    • Spent more than $500 in that month.
  5. Output: The output should list the CustomerID and the Month (in a consistent format, e.g., 'YYYY-MM') for which these criteria were met.

Expected Behavior:

The query should return a list of customer-month pairs that satisfy the conditions. A customer might appear multiple times if they meet the criteria in different months.

Edge Cases:

  • Transactions occurring on the last day of a month.
  • Transactions with zero or negative amounts (though typically not expected in purchase data, consider how your aggregation would handle them if present).
  • Customers with no transactions.
  • Months with no transactions.

Examples

Example 1:

Input Table: Transactions

TransactionIDCustomerIDTransactionDateAmount
11012023-01-15150.00
21012023-01-20200.00
31012023-01-25300.00
41012023-02-05100.00
51022023-01-1050.00
61022023-01-1875.00
71022023-01-22120.00
81022023-01-28300.00
91012023-03-10400.00
101012023-03-15500.00

Output:

CustomerIDMonth
1012023-01
1022023-01

Explanation:

  • Customer 101, January 2023: 3 distinct purchases (Transactions 1, 2, 3), total spending $150 + $200 + $300 = $650. Meets both criteria (>= 3 purchases, > $500).
  • Customer 102, January 2023: 4 distinct purchases (Transactions 5, 6, 7, 8), total spending $50 + $75 + $120 + $300 = $545. Meets both criteria (>= 3 purchases, > $500).
  • Customer 101, February 2023: 1 distinct purchase, total spending $100. Does not meet either criterion.
  • Customer 101, March 2023: 2 distinct purchases, total spending $400 + $500 = $900. Meets the spending criterion but not the purchase count criterion.

Example 2:

Input Table: Transactions

TransactionIDCustomerIDTransactionDateAmount
12012023-04-01600.00
22012023-04-05100.00
32012023-04-10200.00
42022023-05-15100.00
52022023-05-20150.00
62022023-05-25200.00
72022023-05-29100.00

Output:

CustomerIDMonth
2012023-04

Explanation:

  • Customer 201, April 2023: 3 distinct purchases (Transactions 1, 2, 3), total spending $600 + $100 + $200 = $900. Meets both criteria.
  • Customer 202, May 2023: 4 distinct purchases (Transactions 4, 5, 6, 7), total spending $100 + $150 + $200 + $100 = $550. Meets both criteria. (Correction: The explanation for this example was initially incomplete. Customer 202 in May 2023 also meets the criteria. The output should reflect this if the input was intended to produce it. Assuming the initial output was correct, it implies Customer 202 had fewer than 3 purchases or less than $500 spending in May 2023 for the given example. Let's assume the output provided is correct and the input is as written for demonstration purposes.) Self-correction for clarity: Based on the provided output, Customer 202 in May 2023 does NOT meet the criteria. Let's re-evaluate to match the output.
    • Customer 202, May 2023: 4 distinct purchases. Total spending = $100 + $150 + $200 + $100 = $550. This does meet both criteria. There's a discrepancy between my calculation and the stated output for Example 2. To reconcile, let's adjust the input or the output explanation to match the intent.
    • Revised Explanation to match Output:
      • Customer 201, April 2023: 3 distinct purchases, total spending $600 + $100 + $200 = $900. Meets both criteria.
      • Customer 202, May 2023: 4 distinct purchases. Total spending = $100 + $150 + $200 + $100 = $550. For the output to be as shown, Customer 202 would have had to make fewer than 3 purchases OR spend less than $500 in May 2023. Let's assume the intent was for Customer 202 to not qualify, perhaps if Transaction 7 had an amount of $0. Or if there were only 2 distinct transactions listed.
    • Let's proceed with the original input and adjust the output to be accurate based on the rules.
      • Revised Output for Example 2:
        CustomerIDMonth
        2012023-04
        2022023-05
      • Revised Explanation for Example 2:
        • Customer 201, April 2023: 3 distinct purchases, total spending $600 + $100 + $200 = $900. Meets both criteria.
        • 4 distinct purchases, total spending $100 + $150 + $200 + $100 = $550. Meets both criteria.

Example 3: (Edge Case - Month Boundary)

Input Table: Transactions

TransactionIDCustomerIDTransactionDateAmount
13012023-03-30400.00
23012023-03-31300.00
33012023-04-01500.00
43012023-04-02200.00
53012023-04-03100.00

Output:

CustomerIDMonth
3012023-04

Explanation:

  • Customer 301, March 2023: 2 distinct purchases, total spending $400 + $300 = $700. Meets the spending criterion but not the purchase count criterion (needs >= 3).
  • Customer 301, April 2023: 3 distinct purchases (Transactions 3, 4, 5), total spending $500 + $200 + $100 = $800. Meets both criteria.

Constraints

  • The Transactions table can contain up to 1,000,000 rows.
  • CustomerID is an integer.
  • TransactionDate is a date or timestamp data type.
  • Amount is a numeric type, representing a monetary value.
  • The solution should execute efficiently and avoid timeouts on a reasonably performant database system.
  • The Month in the output should be formatted as 'YYYY-MM' (e.g., '2023-01').

Notes

  • You will need to extract the year and month from the TransactionDate to group by month.
  • Consider how to handle potential duplicate TransactionIDs if they were possible (though typically primary keys are unique). For this problem, assume TransactionIDs are unique.
  • This problem requires using SQL functions for date manipulation and window functions or subqueries for aggregations.
  • Think about the order of operations: first group and aggregate per customer-month, then filter based on those aggregates.
Loading editor...
plaintext
Customer 202, May 2023: