Rearrange Products Table
You are tasked with transforming a table representing product information. The current table format is not ideal for certain types of analysis, and you need to restructure it to group product attributes more effectively. This is a common data manipulation task often encountered when preparing data for reporting or specific database queries.
Problem Description
Given a table named Products with columns product_id, column_name, and value, rearrange the table so that each product_id has one row for each distinct column_name, with the corresponding value in a dedicated column. Essentially, you need to pivot the table.
The table Products has the following structure:
product_id: Unique identifier for each product.column_name: The name of an attribute of the product (e.g., 'Low', 'Medium', 'High', 'Color', 'Price').value: The value associated with thecolumn_namefor thatproduct_id.
You need to transform this into a new table where:
- Each
product_idappears only once. - For each distinct
column_namepresent in the original table (e.g., 'Low', 'Medium', 'High', 'Color', 'Price'), there should be a corresponding column in the output. - The
valuefrom the original table should populate the appropriate new column for eachproduct_id.
Key Requirements:
- The output table should have a column for
product_id. - For every unique
column_namefound across all products in the input, a corresponding column should be created in the output. - The values in these new columns should be the
valuefrom the original table, matched byproduct_idandcolumn_name. - If a
product_iddoes not have a specificcolumn_namein the input, the corresponding column in the output for thatproduct_idshould contain a NULL or equivalent representation of a missing value.
Expected Behavior: The transformation should effectively convert rows of attribute-value pairs into columns, making it easier to query specific attributes for each product.
Edge Cases:
- Products with varying sets of
column_names. column_names that appear for only a subset of products.- Empty input table.
Examples
Example 1:
Input:
Products Table:
+------------+-------------+--------+
| product_id | column_name | value |
+------------+-------------+--------+
| 1 | Low | 10 |
| 1 | Medium | 50 |
| 1 | High | 100 |
| 2 | Low | 20 |
| 2 | High | 150 |
| 3 | Medium | 60 |
| 3 | High | 200 |
+------------+-------------+--------+
Output:
+------------+------+--------+------+-----+
| product_id | Low | Medium | High | Price |
+------------+------+--------+------+-----+
| 1 | 10 | 50 | 100 | NULL |
| 2 | 20 | NULL | 150 | NULL |
| 3 | NULL | 60 | 200 | NULL |
+------------+------+--------+------+-----+
Explanation:
The output table pivots the data. For product_id 1, the values for 'Low', 'Medium', and 'High' are placed in their respective columns. Since 'Price' was not present for product_id 1, it's NULL. Product_id 2 is missing 'Medium' and 'Price', hence NULLs. Product_id 3 is missing 'Low' and 'Price'. The distinct column_names are 'Low', 'Medium', 'High', and 'Price'.
Example 2:
Input:
Products Table:
+------------+-------------+--------+
| product_id | column_name | value |
+------------+-------------+--------+
| 101 | Color | Red |
| 101 | Price | 19.99 |
| 102 | Color | Blue |
| 102 | Price | 24.50 |
| 102 | Size | Large |
+------------+-------------+--------+
Output:
+------------+-------+-------+------+
| product_id | Color | Price | Size |
+------------+-------+-------+------+
| 101 | Red | 19.99 | NULL |
| 102 | Blue | 24.50 | Large|
+------------+-------+-------+------+
Explanation:
Product_id 101 has 'Color' and 'Price'. Product_id 102 has 'Color', 'Price', and 'Size'. The output creates columns for all these distinct attribute names and populates them accordingly, using NULL for missing attributes.
Example 3: (Edge Case - Empty Table)
Input:
Products Table:
(empty)
Output:
(empty)
Explanation:
If the input table is empty, the output table should also be empty.
Constraints
- The
product_idwill be an integer. column_namewill be a string.valuecan be a string, integer, or decimal number.- There will be at most 100 distinct
column_namevalues across all products. - There will be at most 1000 distinct
product_ids. - The total number of rows in the
Productstable will not exceed 10,000. - The solution should be efficient and capable of handling the given constraints within a reasonable time.
Notes
- The set of
column_names to be used as output columns might not be explicitly provided and needs to be determined from the input data itself. - Consider the data types of the
valuecolumn when constructing the output. Ifvaluecan be of different types for differentcolumn_names, choose a data type for the output column that can accommodate all potential values (e.g., a string type or a variant type if supported by your environment). - This problem is often solved using a technique called "pivoting".