Customer Purchase Funnel Analysis
This challenge requires you to analyze customer behavior through a purchase funnel using advanced SQL features. You will identify how many customers progress from one stage of the funnel to the next, enabling businesses to pinpoint drop-off points and optimize their marketing and sales strategies.
Problem Description
You are tasked with analyzing customer interactions with an e-commerce platform to understand their journey through a defined purchase funnel. The funnel consists of the following stages:
view_product: Customer viewed at least one product page.add_to_cart: Customer added at least one item to their shopping cart.initiate_checkout: Customer started the checkout process (e.g., clicked "Proceed to Checkout").purchase_complete: Customer successfully completed a purchase.
You need to calculate the number of unique customers who reached each stage of the funnel and the conversion rate between consecutive stages.
Key Requirements:
- Identify unique customers at each stage.
- Calculate the count of customers for each stage.
- Calculate the percentage of customers who progressed from stage N to stage N+1.
- The output should be ordered by the funnel stage.
Expected Behavior:
The output should be a table with columns: funnel_stage and customer_count. Additionally, for stages beyond the first, a conversion_rate column representing the percentage of customers from the previous stage who reached the current stage should be included.
Edge Cases:
- Customers might perform actions out of order (e.g., add to cart before viewing a product). The funnel should reflect the first time a customer reaches a defined stage.
- A customer might perform an action multiple times. We only care about whether they reached the stage at least once.
- Some customers might only reach certain stages and not proceed further.
Examples
Example 1:
Input Data (represented as event logs):
customer_events table:
| customer_id | event_type | event_timestamp |
|---|---|---|
| 101 | view_product | 2023-10-26 10:00:00 |
| 102 | view_product | 2023-10-26 10:05:00 |
| 101 | add_to_cart | 2023-10-26 10:15:00 |
| 103 | view_product | 2023-10-26 10:20:00 |
| 102 | add_to_cart | 2023-10-26 10:25:00 |
| 101 | initiate_checkout | 2023-10-26 10:30:00 |
| 102 | initiate_checkout | 2023-10-26 10:35:00 |
| 103 | add_to_cart | 2023-10-26 10:40:00 |
| 101 | purchase_complete | 2023-10-26 10:45:00 |
| 102 | purchase_complete | 2023-10-26 10:50:00 |
Output:
| funnel_stage | customer_count | conversion_rate |
|---|---|---|
| view_product | 3 | NULL |
| add_to_cart | 3 | 100.00 |
| initiate_checkout | 2 | 66.67 |
| purchase_complete | 2 | 100.00 |
Explanation:
- view_product: Customers 101, 102, and 103 viewed products. (Count = 3)
- add_to_cart: Customers 101, 102, and 103 added to cart. (Count = 3). Conversion rate: (3/3) * 100 = 100.00%
- initiate_checkout: Customers 101 and 102 initiated checkout. (Count = 2). Conversion rate: (2/3) * 100 = 66.67%
- purchase_complete: Customers 101 and 102 completed purchase. (Count = 2). Conversion rate: (2/2) * 100 = 100.00%
Example 2:
Input Data:
customer_events table:
| customer_id | event_type | event_timestamp |
|---|---|---|
| 201 | view_product | 2023-11-01 09:00:00 |
| 202 | view_product | 2023-11-01 09:05:00 |
| 201 | add_to_cart | 2023-11-01 09:10:00 |
| 203 | view_product | 2023-11-01 09:15:00 |
| 201 | purchase_complete | 2023-11-01 09:20:00 |
| 202 | initiate_checkout | 2023-11-01 09:25:00 |
| 203 | add_to_cart | 2023-11-01 09:30:00 |
| 203 | initiate_checkout | 2023-11-01 09:35:00 |
Output:
| funnel_stage | customer_count | conversion_rate |
|---|---|---|
| view_product | 3 | NULL |
| add_to_cart | 2 | 66.67 |
| initiate_checkout | 2 | 100.00 |
| purchase_complete | 1 | 50.00 |
Explanation:
- view_product: Customers 201, 202, and 203. (Count = 3)
- add_to_cart: Customers 201 and 203. (Count = 2). Conversion rate: (2/3) * 100 = 66.67%
- initiate_checkout: Customers 202 and 203. (Count = 2). Conversion rate: (2/2) * 100 = 100.00%
- purchase_complete: Customer 201. (Count = 1). Conversion rate: (1/2) * 100 = 50.00%
Example 3: Handling Out-of-Order and Repeated Actions
Input Data:
customer_events table:
| customer_id | event_type | event_timestamp |
|---|---|---|
| 301 | add_to_cart | 2023-11-05 11:00:00 |
| 301 | view_product | 2023-11-05 11:05:00 |
| 301 | view_product | 2023-11-05 11:10:00 |
| 302 | initiate_checkout | 2023-11-05 11:15:00 |
| 302 | view_product | 2023-11-05 11:20:00 |
| 302 | add_to_cart | 2023-11-05 11:25:00 |
| 301 | initiate_checkout | 2023-11-05 11:30:00 |
| 301 | add_to_cart | 2023-11-05 11:35:00 |
| 302 | purchase_complete | 2023-11-05 11:40:00 |
Output:
| funnel_stage | customer_count | conversion_rate |
|---|---|---|
| view_product | 2 | NULL |
| add_to_cart | 2 | 100.00 |
| initiate_checkout | 2 | 100.00 |
| purchase_complete | 1 | 50.00 |
Explanation:
- view_product: Customers 301 and 302 viewed products at least once. (Count = 2)
- add_to_cart: Customers 301 and 302 added to cart at least once. (Count = 2). Conversion rate: (2/2) * 100 = 100.00%
- initiate_checkout: Customers 301 and 302 initiated checkout at least once. (Count = 2). Conversion rate: (2/2) * 100 = 100.00%
- purchase_complete: Customer 302 completed purchase. (Count = 1). Conversion rate: (1/2) * 100 = 50.00%
Constraints
customer_eventstable contains at least 1 record and at most 1,000,000,000 records.customer_idis an integer.event_typeis a string, one of 'view_product', 'add_to_cart', 'initiate_checkout', 'purchase_complete'.event_timestampis a datetime/timestamp data type.- The solution should be efficient and complete within a reasonable execution time, assuming optimal indexing.
Notes
- The order of
event_timestampfor a given customer is crucial to determine when they first reached a stage. - Consider using window functions (e.g.,
ROW_NUMBER,RANK) to identify the first occurrence of each event type per customer. - You will need to pivot or conditionally aggregate your results to get the desired funnel stage counts.
- Calculating conversion rates requires careful handling of division and potential division-by-zero scenarios (though not explicitly present in the examples, consider it for robustness).
- The
NULLvalue for the first stage's conversion rate is intentional.