Sales Performance Analysis: Multidimensional Reporting
This challenge focuses on designing efficient Online Analytical Processing (OLAP) queries to analyze sales data from multiple perspectives. OLAP is crucial for business intelligence, enabling users to slice, dice, and drill down into data to uncover trends and insights. You will be tasked with creating queries that aggregate sales performance across different dimensions like time, product, and region.
Problem Description
Your task is to design SQL queries that answer specific business questions by aggregating sales data. You are given a simplified database schema representing sales transactions. The goal is to retrieve summarized information efficiently, demonstrating an understanding of OLAP concepts like aggregation, grouping, and potentially window functions.
Key Requirements:
- Total Sales per Month per Product Category: Calculate the total revenue generated for each product category, broken down by month.
- Top Selling Product per Region per Quarter: Identify the product with the highest total sales revenue in each region for each quarter.
- Year-over-Year Sales Growth: For each month, calculate the percentage change in sales revenue compared to the same month in the previous year.
Expected Behavior:
- Queries should return clearly structured results with appropriate labels.
- The calculations should be accurate based on the provided data.
- Queries should be optimized for performance, considering the potential size of the datasets.
Edge Cases:
- Months with no sales for a specific category.
- New product categories introduced mid-year.
- Missing sales data for certain periods.
Examples
Example 1: Total Sales per Month per Product Category
Input Data (Simplified Representation):
SalesTable:sale_id(INT)sale_date(DATE)product_id(INT)quantity(INT)price_per_unit(DECIMAL)
ProductsTable:product_id(INT)category(VARCHAR)
Sample Input Data:
Sales Table:
| sale_id | sale_date | product_id | quantity | price_per_unit |
|---|---|---|---|---|
| 1 | 2023-01-15 | 101 | 2 | 10.00 |
| 2 | 2023-01-20 | 102 | 1 | 25.00 |
| 3 | 2023-02-10 | 101 | 3 | 10.00 |
| 4 | 2023-02-18 | 103 | 1 | 50.00 |
| 5 | 2024-01-10 | 101 | 1 | 12.00 |
Products Table:
| product_id | category |
|---|---|
| 101 | Electronics |
| 102 | Books |
| 103 | Electronics |
Output:
| year | month | category | total_revenue |
|---|---|---|---|
| 2023 | 1 | Electronics | 20.00 |
| 2023 | 1 | Books | 25.00 |
| 2023 | 2 | Electronics | 30.00 |
| 2023 | 2 | Electronics | 50.00 |
| 2024 | 1 | Electronics | 12.00 |
Explanation:
The query calculates quantity * price_per_unit for each sale, extracts the year and month from sale_date, joins with the Products table to get the category, and then groups by year, month, and category to sum the revenue.
Example 2: Top Selling Product per Region per Quarter
Input Data (Simplified Representation):
SalesTable:sale_id(INT)sale_date(DATE)product_id(INT)region_id(INT)quantity(INT)price_per_unit(DECIMAL)
ProductsTable:product_id(INT)product_name(VARCHAR)
RegionsTable:region_id(INT)region_name(VARCHAR)
Sample Input Data:
Sales Table:
| sale_id | sale_date | product_id | region_id | quantity | price_per_unit |
|---|---|---|---|---|---|
| 1 | 2023-01-15 | 101 | 1 | 2 | 10.00 |
| 2 | 2023-01-20 | 102 | 1 | 1 | 25.00 |
| 3 | 2023-04-10 | 101 | 2 | 3 | 10.00 |
| 4 | 2023-04-18 | 103 | 1 | 1 | 50.00 |
| 5 | 2023-07-01 | 101 | 1 | 1 | 10.00 |
Products Table:
| product_id | product_name |
|---|---|
| 101 | Laptop |
| 102 | Keyboard |
| 103 | Monitor |
Regions Table:
| region_id | region_name |
|---|---|
| 1 | North |
| 2 | South |
Output:
| year | quarter | region_name | product_name | total_revenue |
|---|---|---|---|---|
| 2023 | 1 | North | Laptop | 20.00 |
| 2023 | 2 | South | Laptop | 30.00 |
| 2023 | 2 | North | Monitor | 50.00 |
| 2023 | 3 | North | Laptop | 10.00 |
Explanation:
This query first calculates the revenue for each sale. It then determines the quarter from the sale_date, joins with Products and Regions tables. A window function (e.g., ROW_NUMBER or RANK) is used over partitions of year, quarter, and region, ordered by total revenue in descending order, to assign a rank to each product within its respective group. Finally, it filters to keep only the top-ranked product (rank = 1) for each partition.
Example 3: Year-over-Year Sales Growth (Edge Case Consideration)
Input Data: (Same tables as Example 1, but with data spanning across two years)
Sales Table:
| sale_id | sale_date | product_id | quantity | price_per_unit |
|---|---|---|---|---|
| 1 | 2023-01-15 | 101 | 2 | 10.00 |
| 2 | 2024-01-10 | 101 | 1 | 12.00 |
| 3 | 2023-02-10 | 101 | 3 | 10.00 |
| 4 | 2024-02-05 | 101 | 2 | 12.00 |
| 5 | 2023-03-20 | 102 | 1 | 25.00 |
| 6 | 2023-04-01 | 101 | 1 | 10.00 |
| 7 | 2024-04-05 | 101 | 2 | 12.00 |
Products Table:
| product_id | category |
|---|---|
| 101 | Electronics |
| 102 | Books |
Output:
| year | month | current_year_sales | previous_year_sales | yoy_growth_percentage |
|---|---|---|---|---|
| 2023 | 1 | 20.00 | NULL | NULL |
| 2023 | 2 | 30.00 | NULL | NULL |
| 2023 | 3 | 25.00 | NULL | NULL |
| 2023 | 4 | 10.00 | NULL | NULL |
| 2024 | 1 | 12.00 | 20.00 | -40.00 |
| 2024 | 2 | 24.00 | 30.00 | -20.00 |
| 2024 | 4 | 24.00 | 10.00 | 140.00 |
Explanation:
This query first aggregates total sales by month and year. Then, using a window function, it retrieves the sales from the previous year for the same month. Finally, it calculates the year-over-year growth percentage using the formula: (current_year_sales - previous_year_sales) / previous_year_sales * 100. For months in the first year of data, the previous_year_sales and yoy_growth_percentage will be NULL.
Constraints
- The
Salestable can contain up to 10 million records. - The
ProductsandRegionstables will contain at most 1,000 records each. - Dates in the
sale_datecolumn will be within the last 3 years. price_per_unitandquantitywill always be non-negative.- Queries should aim to execute within a reasonable time frame (e.g., under 30 seconds) on typical hardware.
Notes
- You'll need to extract year, month, and quarter information from the
sale_date. - Consider using Common Table Expressions (CTEs) to break down complex queries into more manageable steps.
- For the year-over-year growth calculation, pay attention to how you handle the first year of data where there is no previous year to compare against.
- The exact SQL dialect is not specified; use standard SQL syntax that is widely supported. If you need to use a specific function (like for date extraction), mention it clearly.
- Think about how to efficiently join tables and perform aggregations. Indexes on relevant columns (
sale_date,product_id,region_id) would typically be beneficial in a real-world scenario.