Conditional Aggregation Challenge
Many data analysis tasks require summarizing information based on specific conditions within groups. Conditional aggregation allows you to compute aggregate values (like sums, counts, averages) for subsets of data defined by logical conditions, all within a single query. This is incredibly useful for comparing different categories or tracking progress against specific targets.
Problem Description
Your task is to implement conditional aggregation to analyze sales data. You need to calculate the total revenue and the number of orders for two distinct categories of products: 'Electronics' and 'Clothing', for each store. The output should clearly distinguish between these two categories.
Key Requirements:
- For each store, calculate the sum of
order_amountfor products categorized as 'Electronics'. - For each store, count the number of orders where the product category is 'Electronics'.
- For each store, calculate the sum of
order_amountfor products categorized as 'Clothing'. - For each store, count the number of orders where the product category is 'Clothing'.
Expected Behavior:
The output should be a table with one row per store. Each row should contain the store's identifier, and then four columns representing the conditional aggregations for 'Electronics' and 'Clothing' as described above.
Edge Cases:
- A store might have no orders for 'Electronics' or 'Clothing'. In such cases, the corresponding aggregate values should be zero.
- The input table might contain orders for product categories other than 'Electronics' and 'Clothing'. These should be ignored for the purpose of this aggregation.
Examples
Example 1:
Input Table: Orders
| order_id | store_id | product_category | order_amount |
|---|---|---|---|
| 101 | S1 | Electronics | 500.00 |
| 102 | S2 | Clothing | 150.00 |
| 103 | S1 | Clothing | 200.00 |
| 104 | S1 | Electronics | 750.00 |
| 105 | S3 | Electronics | 300.00 |
| 106 | S2 | Electronics | 600.00 |
| 107 | S1 | Books | 50.00 |
Output Table:
| store_id | total_electronics_revenue | electronics_order_count | total_clothing_revenue | clothing_order_count |
|---|---|---|---|---|
| S1 | 1250.00 | 2 | 200.00 | 1 |
| S2 | 600.00 | 1 | 150.00 | 1 |
| S3 | 300.00 | 1 | 0.00 | 0 |
Explanation:
- Store S1: Has two 'Electronics' orders (amounts 500 and 750, totaling 1250) and one 'Clothing' order (amount 200). The 'Books' order is ignored.
- Store S2: Has one 'Electronics' order (amount 600) and one 'Clothing' order (amount 150).
- Store S3: Has one 'Electronics' order (amount 300) and no 'Clothing' orders.
Example 2:
Input Table: Orders
| order_id | store_id | product_category | order_amount |
|---|---|---|---|
| 201 | S4 | Clothing | 100.00 |
| 202 | S4 | Clothing | 250.00 |
| 203 | S5 | Electronics | 1000.00 |
Output Table:
| store_id | total_electronics_revenue | electronics_order_count | total_clothing_revenue | clothing_order_count |
|---|---|---|---|---|
| S4 | 0.00 | 0 | 350.00 | 2 |
| S5 | 1000.00 | 1 | 0.00 | 0 |
Explanation:
- Store S4: Has no 'Electronics' orders and two 'Clothing' orders (amounts 100 and 250, totaling 350).
- Store S5: Has one 'Electronics' order (amount 1000) and no 'Clothing' orders.
Constraints
- The
order_amountwill be a non-negative decimal number. store_idandproduct_categorywill be non-empty strings.- The input
Orderstable can contain up to 1,000,000 rows. - The query should be efficient and execute within acceptable time limits for large datasets.
Notes
Consider using SQL aggregate functions in conjunction with CASE statements or similar conditional logic. Think about how to group your results by store_id while applying different conditions for aggregation. Remember that you need to handle both summing order_amount and counting orders, conditionally.