Sales Performance Analysis by Region and Product Category
This challenge focuses on analyzing sales data to understand performance across different regions and product categories. You will use CASE statements in conjunction with aggregate functions to categorize and summarize sales data, providing valuable insights for business decision-making. This is a common task in business intelligence and data analysis.
Problem Description
You are provided with a table containing individual sales transactions. Each transaction includes information about the sale_id, region where the sale occurred, the product_category, and the sale_amount.
Your task is to write a SQL query that calculates the total sale_amount for each region and product_category. Furthermore, you need to categorize each sale_amount into one of three tiers: 'Low', 'Medium', or 'High'. A sale is considered 'Low' if its amount is less than 100, 'Medium' if it's between 100 (inclusive) and 500 (exclusive), and 'High' if it's 500 or greater.
The final output should display:
- The
region. - The
product_category. - The
total_sales_amountfor that region and category. - The count of sales falling into the 'Low' tier for that region and category.
- The count of sales falling into the 'Medium' tier for that region and category.
- The count of sales falling into the 'High' tier for that region and category.
You should group your results by region and product_category.
Key Requirements:
- Aggregate
sale_amountto get the total sales per group. - Use a
CASEstatement to classify individual sales into 'Low', 'Medium', or 'High' tiers. - Aggregate the counts of sales within each tier.
Expected Behavior:
The query should return one row for each unique combination of region and product_category. If a particular region-category combination has no sales, it should not appear in the output.
Edge Cases to Consider:
- Sales amounts exactly equal to the tier boundaries (e.g., 100, 500) should be correctly assigned.
Examples
Example 1:
Input Table: sales
| sale_id | region | product_category | sale_amount |
|---|---|---|---|
| 1 | North | Electronics | 150 |
| 2 | North | Electronics | 75 |
| 3 | South | Clothing | 400 |
| 4 | North | Electronics | 600 |
| 5 | South | Clothing | 90 |
| 6 | North | Electronics | 250 |
| 7 | West | Home Goods | 700 |
| 8 | West | Home Goods | 30 |
Output:
| region | product_category | total_sales_amount | low_tier_count | medium_tier_count | high_tier_count |
|---|---|---|---|---|---|
| North | Electronics | 1075 | 1 | 2 | 1 |
| South | Clothing | 490 | 1 | 1 | 0 |
| West | Home Goods | 730 | 1 | 0 | 1 |
Explanation:
- North, Electronics: Total sales = 150 + 75 + 600 + 250 = 1075.
- Sale 2 (75) is 'Low' (count: 1).
- Sales 1 (150) and 6 (250) are 'Medium' (count: 2).
- Sale 4 (600) is 'High' (count: 1).
- South, Clothing: Total sales = 400 + 90 = 490.
- Sale 5 (90) is 'Low' (count: 1).
- Sale 3 (400) is 'Medium' (count: 1).
- No 'High' sales.
- West, Home Goods: Total sales = 700 + 30 = 730.
- Sale 8 (30) is 'Low' (count: 1).
- Sale 7 (700) is 'High' (count: 1).
- No 'Medium' sales.
Example 2:
Input Table: sales
| sale_id | region | product_category | sale_amount |
|---|---|---|---|
| 10 | East | Books | 50 |
| 11 | East | Books | 100 |
| 12 | East | Books | 500 |
| 13 | East | Books | 499 |
| 14 | East | Books | 200 |
Output:
| region | product_category | total_sales_amount | low_tier_count | medium_tier_count | high_tier_count |
|---|---|---|---|---|---|
| East | Books | 1349 | 1 | 2 | 1 |
Explanation:
- East, Books: Total sales = 50 + 100 + 500 + 499 + 200 = 1349.
- Sale 10 (50) is 'Low' (count: 1).
- Sales 11 (100) and 14 (200) are 'Medium' (count: 2).
- Sale 12 (500) is 'High' (count: 1).
- Sale 13 (499) is 'Medium' (count: 2 total).
Constraints
- The
salestable can contain up to 1,000,000 rows. sale_idis a unique integer.region,product_categoryare strings (e.g., 'North', 'Electronics').sale_amountis a non-negative number (integer or decimal).- The query should execute efficiently, aiming for a runtime that completes within a reasonable time on the given dataset size.
Notes
- Remember to correctly handle the inclusive and exclusive boundaries for the 'Medium' tier.
- Think about how to aggregate the counts for each tier within the
GROUP BYclause. - Consider using conditional aggregation if your SQL dialect supports it, or multiple
CASEstatements within aggregate functions.