Analyzing Sales Performance with CTEs
This challenge focuses on using Common Table Expressions (CTEs) in SQL to analyze sales data. You'll be tasked with calculating key performance indicators (KPIs) like total sales per product category and identifying the top-performing categories, leveraging CTEs to break down the query into logical, readable steps. This is a common pattern in data analysis, allowing for complex calculations and comparisons.
Problem Description
You are given a table named Sales with the following schema:
SaleID(INT): Unique identifier for each sale.ProductID(INT): Identifier for the product sold.CategoryID(INT): Identifier for the product category.SaleDate(DATE): Date of the sale.Quantity(INT): Quantity of the product sold in the sale.UnitPrice(DECIMAL): Price of a single unit of the product.
Your task is to write a SQL query that uses CTEs to:
- Calculate the
TotalSalesfor eachProductIDby multiplyingQuantityandUnitPrice. - Calculate the
TotalCategorySalesfor eachCategoryIDby summing theTotalSalesfrom the previous step. - Identify the
Top2Categorieswith the highestTotalCategorySales. - Return the
CategoryIDandTotalCategorySalesfor these top 2 categories, ordered byTotalCategorySalesin descending order.
Key Requirements:
- The solution must use CTEs to achieve the desired result. Using subqueries directly without CTEs will be considered incorrect.
- The query should be readable and well-structured, utilizing CTEs to logically separate the calculation steps.
- The query must correctly handle cases where there are fewer than two categories in the
Salestable. In such cases, return all categories. - The query must handle cases where multiple categories have the same
TotalCategorySalesand are tied for a top position. Return all tied categories.
Expected Behavior:
The query should return a table with two columns: CategoryID and TotalCategorySales. The rows should be ordered by TotalCategorySales in descending order.
Examples
Example 1:
Input:
Sales Table:
SaleID | ProductID | CategoryID | SaleDate | Quantity | UnitPrice
-------|------------|------------|-------------|----------|----------
1 | 101 | 1 | 2023-01-15 | 2 | 25.00
2 | 102 | 2 | 2023-02-20 | 1 | 50.00
3 | 101 | 1 | 2023-03-10 | 3 | 25.00
4 | 103 | 2 | 2023-04-05 | 2 | 75.00
5 | 104 | 3 | 2023-05-12 | 1 | 100.00
Output:
CategoryID | TotalCategorySales
-----------|--------------------
2 | 200.00
1 | 125.00
Explanation:
Category 1 has total sales of (2 * 25) + (3 * 25) = 125.
Category 2 has total sales of (1 * 50) + (2 * 75) = 200.
Category 3 has total sales of (1 * 100) = 100.
The top 2 categories are 2 and 1.
Example 2:
Input:
Sales Table:
SaleID | ProductID | CategoryID | SaleDate | Quantity | UnitPrice
-------|------------|------------|-------------|----------|----------
1 | 201 | 1 | 2023-01-15 | 1 | 10.00
2 | 202 | 1 | 2023-02-20 | 2 | 5.00
Output:
CategoryID | TotalCategorySales
-----------|--------------------
1 | 20.00
Explanation:
Category 1 has total sales of (1 * 10) + (2 * 5) = 20.
Since there is only one category, it is returned as the top category.
Example 3: (Edge Case - Tie)
Input:
Sales Table:
SaleID | ProductID | CategoryID | SaleDate | Quantity | UnitPrice
-------|------------|------------|-------------|----------|----------
1 | 301 | 1 | 2023-01-15 | 2 | 25.00
2 | 302 | 2 | 2023-02-20 | 2 | 25.00
Output:
CategoryID | TotalCategorySales
-----------|--------------------
1 | 50.00
2 | 50.00
Explanation:
Category 1 has total sales of (2 * 25) = 50.
Category 2 has total sales of (2 * 25) = 50.
Both categories are tied for the top position, so both are returned.
Constraints
- The
Salestable will contain at least one row. Quantitywill always be a positive integer.UnitPricewill always be a non-negative decimal number.CategoryIDwill always be a positive integer.- The number of categories in the
Salestable will be between 1 and 1000. - The query should execute within a reasonable time limit (e.g., 5 seconds) on a dataset of up to 1 million rows.
Notes
- Consider using the
RANK()window function to determine the top categories. This is particularly useful for handling ties. - The
LIMITclause might seem tempting, but it doesn't guarantee correct behavior when there are ties. UsingRANK()and filtering based on the rank is a more robust approach. - Focus on clarity and readability. Well-named CTEs make the query easier to understand and maintain.
- Remember to handle the edge case where there are fewer than two categories.
- The order of the categories with the same
TotalCategorySalesis not important. Pseudocode:
// CTE 1: Calculate TotalSales for each ProductID
CREATE CTE ProductSales AS (
SELECT
ProductID,
SUM(Quantity * UnitPrice) AS TotalSales
FROM
Sales
GROUP BY
ProductID
);
// CTE 2: Calculate TotalCategorySales for each CategoryID
CREATE CTE CategorySales AS (
SELECT
CategoryID,
SUM(ps.TotalSales) AS TotalCategorySales
FROM
ProductSales ps
JOIN
Sales s ON ps.ProductID = s.ProductID
GROUP BY
CategoryID
);
// CTE 3: Rank Categories by TotalCategorySales
CREATE CTE RankedCategories AS (
SELECT
CategoryID,
TotalCategorySales,
RANK() OVER (ORDER BY TotalCategorySales DESC) AS CategoryRank
FROM
CategorySales
);
// Final SELECT statement: Retrieve Top 2 Categories
SELECT
CategoryID,
TotalCategorySales
FROM
RankedCategories
WHERE
CategoryRank <= 2
ORDER BY
TotalCategorySales DESC;