Customer Lifetime Value Segmentation
This challenge focuses on using advanced SQL analytical functions to segment customers based on their lifetime value. Understanding customer lifetime value (CLTV) is crucial for businesses to optimize marketing spend, personalize customer experiences, and identify high-value customer segments. You will need to calculate CLTV and then rank customers within their respective segments to understand their relative contribution.
Problem Description
Your task is to write a SQL query that calculates the Customer Lifetime Value (CLTV) for each customer and then assigns a rank to each customer within their CLTV segment. CLTV can be approximated here as the total revenue generated by a customer across all their orders.
What needs to be achieved:
- Calculate the total revenue for each customer.
- Segment customers into distinct CLTV tiers (e.g., 'Low', 'Medium', 'High').
- Rank customers within each CLTV tier based on their total revenue (highest revenue gets rank 1).
Key Requirements:
- Data Sources: Assume you have two tables:
Customers: Contains customer information (customer_id,customer_name, etc.).Orders: Contains order details (order_id,customer_id,order_date,order_total).
- CLTV Calculation: CLTV for a customer is the sum of
order_totalfor all their orders. - Segmentation Logic: You need to define a reasonable way to segment customers into tiers. A common approach is to use quantiles (e.g., tertiles for three segments). If the number of customers is not perfectly divisible by the number of segments, handle the distribution appropriately.
- Ranking: Within each segment, customers should be ranked in descending order of their CLTV. The customer with the highest CLTV in a segment gets rank 1.
Expected Behavior:
The final output should be a table containing:
customer_idcustomer_name(if available inCustomerstable)total_revenue(calculated CLTV)cltv_segment(e.g., 'Low', 'Medium', 'High')segment_rank(rank within thecltv_segment)
Edge Cases:
- Customers with no orders should ideally not appear or have a total revenue of 0.
- Ensure the segmentation logic handles cases where many customers might have the same CLTV.
- Consider the scenario where there are very few customers, making distinct segmentation challenging.
Examples
Example 1:
Input Tables:
Customers:
| customer_id | customer_name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Charlie |
| 4 | David |
| 5 | Eve |
Orders:
| order_id | customer_id | order_date | order_total |
|---|---|---|---|
| 101 | 1 | 2023-01-15 | 150.00 |
| 102 | 2 | 2023-02-20 | 75.50 |
| 103 | 1 | 2023-03-10 | 200.00 |
| 104 | 3 | 2023-04-05 | 300.00 |
| 105 | 4 | 2023-05-12 | 50.00 |
| 106 | 2 | 2023-06-18 | 120.00 |
| 107 | 5 | 2023-07-22 | 90.00 |
| 108 | 3 | 2023-08-01 | 250.00 |
| 109 | 1 | 2023-09-19 | 50.00 |
| 110 | 4 | 2023-10-25 | 100.00 |
(Assume segmentation into 3 tiers: Low, Medium, High using tertiles)
Output:
| customer_id | customer_name | total_revenue | cltv_segment | segment_rank |
|---|---|---|---|---|
| 3 | Charlie | 550.00 | High | 1 |
| 1 | Alice | 400.00 | High | 2 |
| 2 | Bob | 195.50 | Medium | 1 |
| 5 | Eve | 90.00 | Low | 1 |
| 4 | David | 150.00 | Low | 2 |
Explanation:
- CLTV Calculation:
- Alice: 150 + 200 + 50 = 400
- Bob: 75.50 + 120 = 195.50
- Charlie: 300 + 250 = 550
- David: 50 + 100 = 150
- Eve: 90
- Segmentation (Tertiles):
- Sorted CLTVs: 90, 150, 195.50, 400, 550
- The 3 segments (approximately) would be:
- Low: 90, 150
- Medium: 195.50
- High: 400, 550
- Assigning segment names:
- Low: Eve (90), David (150)
- Medium: Bob (195.50)
- High: Charlie (550), Alice (400)
- Ranking:
- High Segment: Charlie (550, Rank 1), Alice (400, Rank 2)
- Medium Segment: Bob (195.50, Rank 1)
- Low Segment: Eve (90, Rank 1), David (150, Rank 2) (Note: The exact cutoffs for tertiles might vary slightly based on the SQL function used and how ties are handled. The example shows a plausible distribution.)
Example 2: Customer with no orders
Input Tables:
Customers:
| customer_id | customer_name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 6 | Frank |
Orders:
| order_id | customer_id | order_date | order_total |
|---|---|---|---|
| 101 | 1 | 2023-01-15 | 150.00 |
| 102 | 2 | 2023-02-20 | 75.50 |
(Assume segmentation into 2 tiers: Low, High using bisection)
Output:
| customer_id | customer_name | total_revenue | cltv_segment | segment_rank |
|---|---|---|---|---|
| 1 | Alice | 150.00 | High | 1 |
| 2 | Bob | 75.50 | Low | 1 |
| 6 | Frank | 0.00 | Low | 2 |
Explanation:
- Frank has no orders, so
total_revenueis 0. - When segmenting, Frank falls into the 'Low' segment, and within that segment, he is ranked after Bob who has higher revenue.
Constraints
- The
Customerstable can contain up to 1,000,000 records. - The
Orderstable can contain up to 100,000,000 records. order_totalwill be a non-negative decimal number.customer_idis a unique identifier in both tables.- The query should execute efficiently, ideally within a few minutes on a reasonably provisioned database. Avoid full table scans on very large tables if possible through indexing.
Notes
- Consider using window functions like
SUM() OVER(),NTILE(),ROW_NUMBER(),RANK(), orDENSE_RANK()to solve this problem efficiently. - You will need to decide how many segments to create (e.g., 3 for tertiles, 4 for quartiles). The challenge implies a flexible number, but aiming for 3 (Low, Medium, High) is a good starting point.
- Be mindful of how your chosen segmentation function (
NTILEor similar) handles cases where the total number of customers is not perfectly divisible by the number of segments. - Ensure you handle customers with zero revenue correctly when both segmenting and ranking.