Analyzing Sales Performance with Advanced SQL Aggregation
This challenge focuses on leveraging advanced SQL aggregation techniques to derive meaningful insights from a sales dataset. You'll be tasked with calculating complex metrics like running totals, cumulative averages, and percentile-based performance rankings to understand sales trends and identify top-performing regions. This is a common task in business intelligence and data analysis, requiring proficiency in SQL aggregation functions and windowing techniques.
Problem Description
You are provided with a table named Sales containing sales data for various regions over a period of time. The table has the following columns:
Region(VARCHAR): The region where the sale occurred.SaleDate(DATE): The date of the sale.SaleAmount(DECIMAL): The amount of the sale.
Your goal is to write SQL queries to answer the following questions:
- Running Total of Sales: Calculate the cumulative sales amount for each region over time. The result should include
Region,SaleDate, andCumulativeSales. - Moving Average of Sales: Calculate a 7-day moving average of sales for each region. The result should include
Region,SaleDate, andMovingAverageSales. - Sales Ranking by Region: Determine the sales rank of each region for each month. The result should include
Region,SaleMonth, andSalesRank. The ranking should be based on the total sales amount for each region within that month. - Percentile Sales: Calculate the 90th percentile of sales amount for each region. The result should include
Regionand90thPercentileSales.
Expected Behavior:
- Queries should be efficient and return accurate results.
- Dates should be handled correctly, considering potential edge cases like missing dates.
- Ranking should be consistent and handle ties appropriately (e.g., using dense_rank).
- Percentile calculations should be accurate.
Examples
Example 1: Running Total of Sales
Input: Sales Table (Sample Data)
Region | SaleDate | SaleAmount
-------|--------------|------------
North | 2023-01-01 | 100.00
North | 2023-01-02 | 150.00
South | 2023-01-01 | 200.00
South | 2023-01-02 | 250.00
North | 2023-01-03 | 120.00
Output:
Region | SaleDate | CumulativeSales
-------|--------------|----------------
North | 2023-01-01 | 100.00
North | 2023-01-02 | 250.00
North | 2023-01-03 | 370.00
South | 2023-01-01 | 200.00
South | 2023-01-02 | 450.00
Explanation: The CumulativeSales is calculated by summing the SaleAmount for each region up to the current SaleDate.
Example 2: Sales Ranking by Region
Input: Sales Table (Sample Data)
Region | SaleDate | SaleAmount
-------|--------------|------------
North | 2023-01-01 | 100.00
North | 2023-01-02 | 150.00
South | 2023-01-01 | 200.00
South | 2023-01-02 | 250.00
East | 2023-01-01 | 300.00
Output:
Region | SaleMonth | SalesRank
-------|-----------|------------
East | 2023-01 | 1
North | 2023-01 | 2
South | 2023-01 | 3
Explanation: The SalesRank is determined by ranking regions based on their total sales amount within each month (SaleMonth).
Example 3: Percentile Sales
Input: Sales Table (Sample Data)
Region | SaleDate | SaleAmount
-------|--------------|------------
North | 2023-01-01 | 100.00
North | 2023-01-02 | 150.00
South | 2023-01-01 | 200.00
South | 2023-01-02 | 250.00
East | 2023-01-01 | 300.00
Output:
Region | 90thPercentileSales
-------|-----------------------
North | 150.00
South | 250.00
East | 300.00
Explanation: The 90th percentile of sales amount is calculated for each region.
Constraints
- The
Salestable will contain at least 100 rows. SaleDatewill be a valid date format.SaleAmountwill be a non-negative decimal number.- Queries should execute within 5 seconds on a moderately sized database.
- The SQL dialect should be compatible with PostgreSQL or MySQL.
Notes
- Consider using window functions (e.g.,
SUM() OVER(),AVG() OVER(),RANK() OVER(),PERCENTILE_CONT() OVER()) to efficiently calculate the required metrics. - Pay attention to the order of operations within the window functions to ensure correct results.
- For the moving average, you might need to handle edge cases where there are fewer than 7 days of data available. Consider using
IGNORE NULLSwithin the window function if your database supports it. - For the ranking, consider using
DENSE_RANK()to handle ties gracefully. - For percentile calculation,
PERCENTILE_CONT()is generally preferred for continuous data. If your database doesn't support it, you may need to approximate the percentile using other techniques. - The
SaleMonthcan be derived from theSaleDatecolumn using appropriate date functions (e.g.,DATE_TRUNC('month', SaleDate)in PostgreSQL orDATE_FORMAT(SaleDate, '%Y-%m')in MySQL). Pseudocode for the queries:
-
Running Total:
SELECT Region, SaleDate, SUM(SaleAmount) OVER (PARTITION BY Region ORDER BY SaleDate) AS CumulativeSales FROM Sales -
Moving Average:
SELECT Region, SaleDate, AVG(SaleAmount) OVER (PARTITION BY Region ORDER BY SaleDate ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS MovingAverageSales FROM Sales -
Sales Ranking:
SELECT Region, DATE_TRUNC('month', SaleDate) AS SaleMonth, DENSE_RANK() OVER (PARTITION BY DATE_TRUNC('month', SaleDate) ORDER BY SUM(SaleAmount) DESC) AS SalesRank FROM Sales GROUP BY Region, DATE_TRUNC('month', SaleDate) -
Percentile Sales:
SELECT Region, PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY SaleAmount) OVER (PARTITION BY Region) AS 90thPercentileSales FROM Sales GROUP BY Region