Transforming Sales Data: Pivot and Unpivot Challenge
Imagine you're working with a dataset of product sales across different regions and months. This data is often stored in a way that's convenient for entry but difficult for analysis, particularly when you want to see sales figures in a summarized, cross-tabulated format or when you need to convert such a format back into a more granular list. This challenge will test your ability to use PIVOT and UNPIVOT operations in SQL to reshape this data effectively.
Problem Description
You are provided with two tables: MonthlySalesSummary and RegionalSalesDetail.
MonthlySalesSummary: This table contains aggregated sales data, where each row represents a product, and columns represent the sales for that product in each of the twelve months. This format is useful for a quick overview of a product's yearly performance but makes it hard to compare sales of different products within a specific month.RegionalSalesDetail: This table contains a more granular view of sales, detailing sales for each product in each region for a specific month. This is a "tall" format, which is generally preferred for analytical queries.
Your task is to:
- Part 1: Unpivot
MonthlySalesSummary: Transform theMonthlySalesSummarytable into a "tall" format, similar toRegionalSalesDetailbut without the region information. The output should list each product, the month, and the corresponding sales amount. - Part 2: Pivot
RegionalSalesDetail: Transform theRegionalSalesDetailtable into a "wide" format, where rows represent products, columns represent regions, and the values are the total sales for that product in that region across all months.
Key Requirements:
- You must use the
UNPIVOToperator (or its equivalent functionality in your specific SQL dialect) for Part 1. - You must use the
PIVOToperator (or its equivalent functionality in your specific SQL dialect) for Part 2. - The unpivoted data should have columns for
ProductID,MonthName, andSalesAmount. - The pivoted data should have columns for
ProductID, and then columns for each unique region (e.g.,North,South,East,West), with the values representing total sales for that product and region. - Handle cases where a product might not have sales in a particular month or region (these should typically result in
NULLor0depending on the pivot/unpivot implementation and desired outcome). For this challenge, assume0for missing values in the pivoted output.
Expected Behavior:
- Unpivot: Each row in
MonthlySalesSummarywill expand into 12 rows (one for each month) in the unpivoted output. - Pivot: The
RegionalSalesDetailtable will be aggregated first byProductIDandRegionto get total sales per region before pivoting. The pivoted output will have fewer rows than the input but potentially many more columns.
Edge Cases:
- Missing Months/Regions: If a product has no sales for a specific month in
MonthlySalesSummary, the unpivoted output for that month should reflect zero sales. For the pivot, if a product has no sales in a specific region, that cell in the pivoted output should display0. - Data Types: Ensure that the sales amounts are treated as numerical values throughout the transformations.
Examples
Example 1: Unpivoting MonthlySalesSummary
Input MonthlySalesSummary Table:
| ProductID | JanSales | FebSales | MarSales |
|---|---|---|---|
| P101 | 1500 | 1600 | 1700 |
| P102 | 2000 | 2100 | 2200 |
Output of Part 1 (Unpivot):
| ProductID | MonthName | SalesAmount |
|---|---|---|
| P101 | JanSales | 1500 |
| P101 | FebSales | 1600 |
| P101 | MarSales | 1700 |
| P102 | JanSales | 2000 |
| P102 | FebSales | 2100 |
| P102 | MarSales | 2200 |
Explanation: The original rows for P101 and P102 have been expanded. Each monthly sales column (JanSales, FebSales, MarSales) has become a row, with MonthName indicating the source column and SalesAmount holding the value.
Example 2: Pivoting RegionalSalesDetail
Input RegionalSalesDetail Table:
| ProductID | Region | Month | SalesAmount |
|---|---|---|---|
| P101 | North | Jan | 500 |
| P101 | South | Jan | 600 |
| P101 | North | Feb | 700 |
| P102 | East | Jan | 1000 |
| P102 | West | Jan | 1100 |
| P101 | South | Feb | 800 |
Output of Part 2 (Pivot):
| ProductID | North | South | East | West |
|---|---|---|---|---|
| P101 | 1200 | 1400 | 0 | 0 |
| P102 | 0 | 0 | 1000 | 1100 |
Explanation: The RegionalSalesDetail table was first aggregated by ProductID and Region. For P101, North sales are 500 (Jan) + 700 (Feb) = 1200. South sales are 600 (Jan) + 800 (Feb) = 1400. East and West had no sales for P101. For P102, East sales are 1000 (Jan), and West sales are 1100 (Jan). Then, the Region values were transformed into columns, and the SalesAmount became the values. Missing combinations (e.g., P101 in East/West, P102 in North/South) resulted in 0.
Example 3: Handling Missing Data in Pivot
Input RegionalSalesDetail Table:
| ProductID | Region | Month | SalesAmount |
|---|---|---|---|
| P103 | North | Apr | 300 |
| P103 | East | Apr | 400 |
| P103 | North | May | 350 |
Output of Part 2 (Pivot):
| ProductID | North | South | East | West |
|---|---|---|---|---|
| P103 | 650 | 0 | 400 | 0 |
Explanation: P103 had sales only in the North and East regions. The total North sales are 300 (Apr) + 350 (May) = 650. East sales are 400 (Apr). South and West regions have 0 sales for P103.
Constraints
- The
MonthlySalesSummarytable will contain between 100 and 1,000,000 rows. - The
MonthlySalesSummarytable will have exactly 12 month columns (e.g.,JanSales,FebSales, ...,DecSales). - The
RegionalSalesDetailtable will contain between 1,000 and 10,000,000 rows. ProductIDwill be a string (VARCHAR or similar).MonthNamein the unpivot output should be derived from the column names (e.g.,JanSalesshould becomeJanSales).SalesAmountwill be a numeric type (INTEGER, DECIMAL, etc.).- Your solution should execute within 60 seconds.
Notes
- The exact syntax for
PIVOTandUNPIVOTcan vary slightly between SQL dialects (e.g., SQL Server, Oracle, PostgreSQL withcrosstabextension, etc.). Focus on the logical operation. You may need to perform an aggregation step (e.g.,SUM) before pivoting if there are multiple rows per product/region/month combination in the input. For this challenge, assumeRegionalSalesDetailcan have duplicate entries for the same ProductID, Region, and Month, and you need to sum them up. - For Part 1 (UNPIVOT), you might need to explicitly list all 12 month columns.
- For Part 2 (PIVOT), you will need to specify which column contains the values to be aggregated (
SalesAmount), which column's values become the new column headers (Region), and which column identifies the rows (ProductID). - Consider how you would handle the case where
SalesAmountmight beNULLin the input data. For this challenge, assume anyNULLsales amounts should be treated as0in both the unpivoted and pivoted outputs.