Hone logo
Hone
Problems

Transforming Sales Data: Pivot and Unpivot Challenge

Imagine you're working with a dataset of product sales across different regions and months. This data is often stored in a way that's convenient for entry but difficult for analysis, particularly when you want to see sales figures in a summarized, cross-tabulated format or when you need to convert such a format back into a more granular list. This challenge will test your ability to use PIVOT and UNPIVOT operations in SQL to reshape this data effectively.

Problem Description

You are provided with two tables: MonthlySalesSummary and RegionalSalesDetail.

  1. MonthlySalesSummary: This table contains aggregated sales data, where each row represents a product, and columns represent the sales for that product in each of the twelve months. This format is useful for a quick overview of a product's yearly performance but makes it hard to compare sales of different products within a specific month.
  2. RegionalSalesDetail: This table contains a more granular view of sales, detailing sales for each product in each region for a specific month. This is a "tall" format, which is generally preferred for analytical queries.

Your task is to:

  • Part 1: Unpivot MonthlySalesSummary: Transform the MonthlySalesSummary table into a "tall" format, similar to RegionalSalesDetail but without the region information. The output should list each product, the month, and the corresponding sales amount.
  • Part 2: Pivot RegionalSalesDetail: Transform the RegionalSalesDetail table into a "wide" format, where rows represent products, columns represent regions, and the values are the total sales for that product in that region across all months.

Key Requirements:

  • You must use the UNPIVOT operator (or its equivalent functionality in your specific SQL dialect) for Part 1.
  • You must use the PIVOT operator (or its equivalent functionality in your specific SQL dialect) for Part 2.
  • The unpivoted data should have columns for ProductID, MonthName, and SalesAmount.
  • The pivoted data should have columns for ProductID, and then columns for each unique region (e.g., North, South, East, West), with the values representing total sales for that product and region.
  • Handle cases where a product might not have sales in a particular month or region (these should typically result in NULL or 0 depending on the pivot/unpivot implementation and desired outcome). For this challenge, assume 0 for missing values in the pivoted output.

Expected Behavior:

  • Unpivot: Each row in MonthlySalesSummary will expand into 12 rows (one for each month) in the unpivoted output.
  • Pivot: The RegionalSalesDetail table will be aggregated first by ProductID and Region to get total sales per region before pivoting. The pivoted output will have fewer rows than the input but potentially many more columns.

Edge Cases:

  • Missing Months/Regions: If a product has no sales for a specific month in MonthlySalesSummary, the unpivoted output for that month should reflect zero sales. For the pivot, if a product has no sales in a specific region, that cell in the pivoted output should display 0.
  • Data Types: Ensure that the sales amounts are treated as numerical values throughout the transformations.

Examples

Example 1: Unpivoting MonthlySalesSummary

Input MonthlySalesSummary Table:

ProductIDJanSalesFebSalesMarSales
P101150016001700
P102200021002200

Output of Part 1 (Unpivot):

ProductIDMonthNameSalesAmount
P101JanSales1500
P101FebSales1600
P101MarSales1700
P102JanSales2000
P102FebSales2100
P102MarSales2200

Explanation: The original rows for P101 and P102 have been expanded. Each monthly sales column (JanSales, FebSales, MarSales) has become a row, with MonthName indicating the source column and SalesAmount holding the value.

Example 2: Pivoting RegionalSalesDetail

Input RegionalSalesDetail Table:

ProductIDRegionMonthSalesAmount
P101NorthJan500
P101SouthJan600
P101NorthFeb700
P102EastJan1000
P102WestJan1100
P101SouthFeb800

Output of Part 2 (Pivot):

ProductIDNorthSouthEastWest
P1011200140000
P1020010001100

Explanation: The RegionalSalesDetail table was first aggregated by ProductID and Region. For P101, North sales are 500 (Jan) + 700 (Feb) = 1200. South sales are 600 (Jan) + 800 (Feb) = 1400. East and West had no sales for P101. For P102, East sales are 1000 (Jan), and West sales are 1100 (Jan). Then, the Region values were transformed into columns, and the SalesAmount became the values. Missing combinations (e.g., P101 in East/West, P102 in North/South) resulted in 0.

Example 3: Handling Missing Data in Pivot

Input RegionalSalesDetail Table:

ProductIDRegionMonthSalesAmount
P103NorthApr300
P103EastApr400
P103NorthMay350

Output of Part 2 (Pivot):

ProductIDNorthSouthEastWest
P10365004000

Explanation: P103 had sales only in the North and East regions. The total North sales are 300 (Apr) + 350 (May) = 650. East sales are 400 (Apr). South and West regions have 0 sales for P103.

Constraints

  • The MonthlySalesSummary table will contain between 100 and 1,000,000 rows.
  • The MonthlySalesSummary table will have exactly 12 month columns (e.g., JanSales, FebSales, ..., DecSales).
  • The RegionalSalesDetail table will contain between 1,000 and 10,000,000 rows.
  • ProductID will be a string (VARCHAR or similar).
  • MonthName in the unpivot output should be derived from the column names (e.g., JanSales should become JanSales).
  • SalesAmount will be a numeric type (INTEGER, DECIMAL, etc.).
  • Your solution should execute within 60 seconds.

Notes

  • The exact syntax for PIVOT and UNPIVOT can vary slightly between SQL dialects (e.g., SQL Server, Oracle, PostgreSQL with crosstab extension, etc.). Focus on the logical operation. You may need to perform an aggregation step (e.g., SUM) before pivoting if there are multiple rows per product/region/month combination in the input. For this challenge, assume RegionalSalesDetail can have duplicate entries for the same ProductID, Region, and Month, and you need to sum them up.
  • For Part 1 (UNPIVOT), you might need to explicitly list all 12 month columns.
  • For Part 2 (PIVOT), you will need to specify which column contains the values to be aggregated (SalesAmount), which column's values become the new column headers (Region), and which column identifies the rows (ProductID).
  • Consider how you would handle the case where SalesAmount might be NULL in the input data. For this challenge, assume any NULL sales amounts should be treated as 0 in both the unpivoted and pivoted outputs.
Loading editor...
plaintext