Analyzing Sales Performance with Window Functions
This challenge focuses on using advanced SQL window functions to analyze sales data and derive meaningful insights. You'll be tasked with calculating running totals, ranking products within categories, and identifying top performers relative to their respective categories – all using the power of window functions. This is a common task in business intelligence and data analysis, allowing for deeper understanding of trends and performance.
Problem Description
You are given a table named sales containing sales data for various products across different categories. The table has the following columns:
sale_id(INTEGER): Unique identifier for each sale.product_id(INTEGER): Identifier for the product sold.category(VARCHAR): Category of the product.sale_date(DATE): Date of the sale.sale_amount(DECIMAL): Amount of the sale.
Your goal is to write SQL queries to answer the following questions using window functions:
- Running Total of Sales: Calculate the cumulative sum of
sale_amountfor eachcategoryover time (sale_date). The result should includesale_date,category, and arunning_totalcolumn. - Product Ranking within Category: For each
category, rank theproduct_idbased on its totalsale_amountwithin that category. The result should includecategory,product_id,total_sale_amount(sum of sales for that product within the category), and aproduct_rankcolumn. - Top 3 Products per Category: Identify the top 3 products within each
categorybased on their totalsale_amount. The result should includecategory,product_id,total_sale_amount, and arank_within_categorycolumn.
Examples
Example 1: Running Total of Sales
Input:
sales table:
sale_id | product_id | category | sale_date | sale_amount
------- | ---------- | -------- | ---------- | -----------
1 | 101 | Electronics | 2023-01-01 | 100.00
2 | 102 | Electronics | 2023-01-01 | 150.00
3 | 201 | Clothing | 2023-01-01 | 50.00
4 | 101 | Electronics | 2023-01-02 | 120.00
5 | 201 | Clothing | 2023-01-02 | 75.00
6 | 102 | Electronics | 2023-01-03 | 80.00
7 | 202 | Clothing | 2023-01-03 | 60.00
Output:
sale_date | category | running_total
---------- | ----------- | -------------
2023-01-01 | Electronics | 250.00
2023-01-01 | Clothing | 50.00
2023-01-02 | Electronics | 370.00
2023-01-02 | Clothing | 125.00
2023-01-03 | Electronics | 450.00
2023-01-03 | Clothing | 185.00
Explanation: The running_total is calculated by summing the sale_amount for each category sequentially based on the sale_date.
Example 2: Product Ranking within Category
Input:
sales table: (same as above)
Output:
category | product_id | total_sale_amount | product_rank
----------- | ---------- | ----------------- | ------------
Clothing | 201 | 125.00 | 1
Clothing | 202 | 60.00 | 2
Electronics | 102 | 230.00 | 1
Electronics | 101 | 220.00 | 2
Explanation: For each category, the product_id is ranked based on the total_sale_amount within that category. The product with the highest total_sale_amount gets rank 1.
Example 3: Top 3 Products per Category
Input:
sales table: (same as above)
Output:
category | product_id | total_sale_amount | rank_within_category
----------- | ---------- | ----------------- | --------------------
Clothing | 201 | 125.00 | 1
Clothing | 202 | 60.00 | 2
Electronics | 102 | 230.00 | 1
Electronics | 101 | 220.00 | 2
Explanation: This query returns the top 3 products (or fewer if there are less than 3 products) within each category based on their total sales amount.
Constraints
- The
salestable will contain at least 10 rows and up to 1000 rows. sale_datewill be a valid date format.sale_amountwill be a non-negative decimal number.- The database system supports standard SQL window functions (e.g.,
OVER(),PARTITION BY,ORDER BY,ROW_NUMBER(),RANK()). - Performance is important. Queries should be optimized to execute efficiently.
Notes
-
Consider using
RANK()orROW_NUMBER()for ranking.RANK()assigns the same rank to ties, whileROW_NUMBER()assigns unique ranks even for ties. Choose the appropriate function based on the desired behavior. -
The
OVER()clause is crucial for defining the window frame.PARTITION BYdivides the data into groups, andORDER BYspecifies the order within each group. -
For the "Top 3 Products per Category" query, you'll likely need to combine window functions with a filtering condition to select only the top 3 ranked products.
-
The order of
sale_datein the running total calculation is important for accurate cumulative sums. -
Assume the database system is case-insensitive for category names.
-
The provided examples are illustrative and may not cover all possible scenarios. Your solution should be robust and handle various input data.
-
Pseudocode for the Running Total Query:
SELECT sale_date, category, SUM(sale_amount) OVER (PARTITION BY category ORDER BY sale_date) AS running_total FROM sales ORDER BY category, sale_date;Pseudocode for the Product Ranking Query:
SELECT category, product_id, SUM(sale_amount) OVER (PARTITION BY category) AS total_sale_amount, RANK() OVER (PARTITION BY category ORDER BY SUM(sale_amount) DESC) AS product_rank FROM sales GROUP BY category, product_id;Pseudocode for the Top 3 Products Query:
WITH RankedSales AS ( SELECT category, product_id, SUM(sale_amount) AS total_sale_amount, RANK() OVER (PARTITION BY category ORDER BY SUM(sale_amount) DESC) AS rank_within_category FROM sales GROUP BY category, product_id ) SELECT category, product_id, total_sale_amount, rank_within_category FROM RankedSales WHERE rank_within_category <= 3 ORDER BY category, rank_within_category;