Hone logo
Hone
Problems

Rearrange Products Table

You are tasked with transforming a table representing product information. The current table format is not ideal for certain types of analysis, and you need to restructure it to group product attributes more effectively. This is a common data manipulation task often encountered when preparing data for reporting or specific database queries.

Problem Description

Given a table named Products with columns product_id, column_name, and value, rearrange the table so that each product_id has one row for each distinct column_name, with the corresponding value in a dedicated column. Essentially, you need to pivot the table.

The table Products has the following structure:

  • product_id: Unique identifier for each product.
  • column_name: The name of an attribute of the product (e.g., 'Low', 'Medium', 'High', 'Color', 'Price').
  • value: The value associated with the column_name for that product_id.

You need to transform this into a new table where:

  • Each product_id appears only once.
  • For each distinct column_name present in the original table (e.g., 'Low', 'Medium', 'High', 'Color', 'Price'), there should be a corresponding column in the output.
  • The value from the original table should populate the appropriate new column for each product_id.

Key Requirements:

  1. The output table should have a column for product_id.
  2. For every unique column_name found across all products in the input, a corresponding column should be created in the output.
  3. The values in these new columns should be the value from the original table, matched by product_id and column_name.
  4. If a product_id does not have a specific column_name in the input, the corresponding column in the output for that product_id should contain a NULL or equivalent representation of a missing value.

Expected Behavior: The transformation should effectively convert rows of attribute-value pairs into columns, making it easier to query specific attributes for each product.

Edge Cases:

  • Products with varying sets of column_names.
  • column_names that appear for only a subset of products.
  • Empty input table.

Examples

Example 1:

Input:
Products Table:
+------------+-------------+--------+
| product_id | column_name | value  |
+------------+-------------+--------+
| 1          | Low         | 10     |
| 1          | Medium      | 50     |
| 1          | High        | 100    |
| 2          | Low         | 20     |
| 2          | High        | 150    |
| 3          | Medium      | 60     |
| 3          | High        | 200    |
+------------+-------------+--------+

Output:
+------------+------+--------+------+-----+
| product_id | Low  | Medium | High | Price |
+------------+------+--------+------+-----+
| 1          | 10   | 50     | 100  | NULL  |
| 2          | 20   | NULL   | 150  | NULL  |
| 3          | NULL | 60     | 200  | NULL  |
+------------+------+--------+------+-----+

Explanation:
The output table pivots the data. For product_id 1, the values for 'Low', 'Medium', and 'High' are placed in their respective columns. Since 'Price' was not present for product_id 1, it's NULL. Product_id 2 is missing 'Medium' and 'Price', hence NULLs. Product_id 3 is missing 'Low' and 'Price'. The distinct column_names are 'Low', 'Medium', 'High', and 'Price'.

Example 2:

Input:
Products Table:
+------------+-------------+--------+
| product_id | column_name | value  |
+------------+-------------+--------+
| 101        | Color       | Red    |
| 101        | Price       | 19.99  |
| 102        | Color       | Blue   |
| 102        | Price       | 24.50  |
| 102        | Size        | Large  |
+------------+-------------+--------+

Output:
+------------+-------+-------+------+
| product_id | Color | Price | Size |
+------------+-------+-------+------+
| 101        | Red   | 19.99 | NULL |
| 102        | Blue  | 24.50 | Large|
+------------+-------+-------+------+

Explanation:
Product_id 101 has 'Color' and 'Price'. Product_id 102 has 'Color', 'Price', and 'Size'. The output creates columns for all these distinct attribute names and populates them accordingly, using NULL for missing attributes.

Example 3: (Edge Case - Empty Table)

Input:
Products Table:
(empty)

Output:
(empty)

Explanation:
If the input table is empty, the output table should also be empty.

Constraints

  • The product_id will be an integer.
  • column_name will be a string.
  • value can be a string, integer, or decimal number.
  • There will be at most 100 distinct column_name values across all products.
  • There will be at most 1000 distinct product_ids.
  • The total number of rows in the Products table will not exceed 10,000.
  • The solution should be efficient and capable of handling the given constraints within a reasonable time.

Notes

  • The set of column_names to be used as output columns might not be explicitly provided and needs to be determined from the input data itself.
  • Consider the data types of the value column when constructing the output. If value can be of different types for different column_names, choose a data type for the output column that can accommodate all potential values (e.g., a string type or a variant type if supported by your environment).
  • This problem is often solved using a technique called "pivoting".
Loading editor...
plaintext