Hone logo
Hone
Problems

Customer Lifetime Value Segmentation

This challenge focuses on using advanced SQL analytical functions to segment customers based on their lifetime value. Understanding customer lifetime value (CLTV) is crucial for businesses to optimize marketing spend, personalize customer experiences, and identify high-value customer segments. You will need to calculate CLTV and then rank customers within their respective segments to understand their relative contribution.

Problem Description

Your task is to write a SQL query that calculates the Customer Lifetime Value (CLTV) for each customer and then assigns a rank to each customer within their CLTV segment. CLTV can be approximated here as the total revenue generated by a customer across all their orders.

What needs to be achieved:

  1. Calculate the total revenue for each customer.
  2. Segment customers into distinct CLTV tiers (e.g., 'Low', 'Medium', 'High').
  3. Rank customers within each CLTV tier based on their total revenue (highest revenue gets rank 1).

Key Requirements:

  • Data Sources: Assume you have two tables:
    • Customers: Contains customer information (customer_id, customer_name, etc.).
    • Orders: Contains order details (order_id, customer_id, order_date, order_total).
  • CLTV Calculation: CLTV for a customer is the sum of order_total for all their orders.
  • Segmentation Logic: You need to define a reasonable way to segment customers into tiers. A common approach is to use quantiles (e.g., tertiles for three segments). If the number of customers is not perfectly divisible by the number of segments, handle the distribution appropriately.
  • Ranking: Within each segment, customers should be ranked in descending order of their CLTV. The customer with the highest CLTV in a segment gets rank 1.

Expected Behavior:

The final output should be a table containing:

  • customer_id
  • customer_name (if available in Customers table)
  • total_revenue (calculated CLTV)
  • cltv_segment (e.g., 'Low', 'Medium', 'High')
  • segment_rank (rank within the cltv_segment)

Edge Cases:

  • Customers with no orders should ideally not appear or have a total revenue of 0.
  • Ensure the segmentation logic handles cases where many customers might have the same CLTV.
  • Consider the scenario where there are very few customers, making distinct segmentation challenging.

Examples

Example 1:

Input Tables:

Customers:

customer_idcustomer_name
1Alice
2Bob
3Charlie
4David
5Eve

Orders:

order_idcustomer_idorder_dateorder_total
10112023-01-15150.00
10222023-02-2075.50
10312023-03-10200.00
10432023-04-05300.00
10542023-05-1250.00
10622023-06-18120.00
10752023-07-2290.00
10832023-08-01250.00
10912023-09-1950.00
11042023-10-25100.00

(Assume segmentation into 3 tiers: Low, Medium, High using tertiles)

Output:

customer_idcustomer_nametotal_revenuecltv_segmentsegment_rank
3Charlie550.00High1
1Alice400.00High2
2Bob195.50Medium1
5Eve90.00Low1
4David150.00Low2

Explanation:

  • CLTV Calculation:
    • Alice: 150 + 200 + 50 = 400
    • Bob: 75.50 + 120 = 195.50
    • Charlie: 300 + 250 = 550
    • David: 50 + 100 = 150
    • Eve: 90
  • Segmentation (Tertiles):
    • Sorted CLTVs: 90, 150, 195.50, 400, 550
    • The 3 segments (approximately) would be:
      • Low: 90, 150
      • Medium: 195.50
      • High: 400, 550
    • Assigning segment names:
      • Low: Eve (90), David (150)
      • Medium: Bob (195.50)
      • High: Charlie (550), Alice (400)
  • Ranking:
    • High Segment: Charlie (550, Rank 1), Alice (400, Rank 2)
    • Medium Segment: Bob (195.50, Rank 1)
    • Low Segment: Eve (90, Rank 1), David (150, Rank 2) (Note: The exact cutoffs for tertiles might vary slightly based on the SQL function used and how ties are handled. The example shows a plausible distribution.)

Example 2: Customer with no orders

Input Tables:

Customers:

customer_idcustomer_name
1Alice
2Bob
6Frank

Orders:

order_idcustomer_idorder_dateorder_total
10112023-01-15150.00
10222023-02-2075.50

(Assume segmentation into 2 tiers: Low, High using bisection)

Output:

customer_idcustomer_nametotal_revenuecltv_segmentsegment_rank
1Alice150.00High1
2Bob75.50Low1
6Frank0.00Low2

Explanation:

  • Frank has no orders, so total_revenue is 0.
  • When segmenting, Frank falls into the 'Low' segment, and within that segment, he is ranked after Bob who has higher revenue.

Constraints

  • The Customers table can contain up to 1,000,000 records.
  • The Orders table can contain up to 100,000,000 records.
  • order_total will be a non-negative decimal number.
  • customer_id is a unique identifier in both tables.
  • The query should execute efficiently, ideally within a few minutes on a reasonably provisioned database. Avoid full table scans on very large tables if possible through indexing.

Notes

  • Consider using window functions like SUM() OVER(), NTILE(), ROW_NUMBER(), RANK(), or DENSE_RANK() to solve this problem efficiently.
  • You will need to decide how many segments to create (e.g., 3 for tertiles, 4 for quartiles). The challenge implies a flexible number, but aiming for 3 (Low, Medium, High) is a good starting point.
  • Be mindful of how your chosen segmentation function (NTILE or similar) handles cases where the total number of customers is not perfectly divisible by the number of segments.
  • Ensure you handle customers with zero revenue correctly when both segmenting and ranking.
Loading editor...
plaintext