Hone logo
Hone
Problems

Conditional Aggregation Challenge

Many data analysis tasks require summarizing information based on specific conditions within groups. Conditional aggregation allows you to compute aggregate values (like sums, counts, averages) for subsets of data defined by logical conditions, all within a single query. This is incredibly useful for comparing different categories or tracking progress against specific targets.

Problem Description

Your task is to implement conditional aggregation to analyze sales data. You need to calculate the total revenue and the number of orders for two distinct categories of products: 'Electronics' and 'Clothing', for each store. The output should clearly distinguish between these two categories.

Key Requirements:

  • For each store, calculate the sum of order_amount for products categorized as 'Electronics'.
  • For each store, count the number of orders where the product category is 'Electronics'.
  • For each store, calculate the sum of order_amount for products categorized as 'Clothing'.
  • For each store, count the number of orders where the product category is 'Clothing'.

Expected Behavior:

The output should be a table with one row per store. Each row should contain the store's identifier, and then four columns representing the conditional aggregations for 'Electronics' and 'Clothing' as described above.

Edge Cases:

  • A store might have no orders for 'Electronics' or 'Clothing'. In such cases, the corresponding aggregate values should be zero.
  • The input table might contain orders for product categories other than 'Electronics' and 'Clothing'. These should be ignored for the purpose of this aggregation.

Examples

Example 1:

Input Table: Orders

order_idstore_idproduct_categoryorder_amount
101S1Electronics500.00
102S2Clothing150.00
103S1Clothing200.00
104S1Electronics750.00
105S3Electronics300.00
106S2Electronics600.00
107S1Books50.00

Output Table:

store_idtotal_electronics_revenueelectronics_order_counttotal_clothing_revenueclothing_order_count
S11250.002200.001
S2600.001150.001
S3300.0010.000

Explanation:

  • Store S1: Has two 'Electronics' orders (amounts 500 and 750, totaling 1250) and one 'Clothing' order (amount 200). The 'Books' order is ignored.
  • Store S2: Has one 'Electronics' order (amount 600) and one 'Clothing' order (amount 150).
  • Store S3: Has one 'Electronics' order (amount 300) and no 'Clothing' orders.

Example 2:

Input Table: Orders

order_idstore_idproduct_categoryorder_amount
201S4Clothing100.00
202S4Clothing250.00
203S5Electronics1000.00

Output Table:

store_idtotal_electronics_revenueelectronics_order_counttotal_clothing_revenueclothing_order_count
S40.000350.002
S51000.0010.000

Explanation:

  • Store S4: Has no 'Electronics' orders and two 'Clothing' orders (amounts 100 and 250, totaling 350).
  • Store S5: Has one 'Electronics' order (amount 1000) and no 'Clothing' orders.

Constraints

  • The order_amount will be a non-negative decimal number.
  • store_id and product_category will be non-empty strings.
  • The input Orders table can contain up to 1,000,000 rows.
  • The query should be efficient and execute within acceptable time limits for large datasets.

Notes

Consider using SQL aggregate functions in conjunction with CASE statements or similar conditional logic. Think about how to group your results by store_id while applying different conditions for aggregation. Remember that you need to handle both summing order_amount and counting orders, conditionally.

Loading editor...
plaintext