Hone logo
Hone
Problems

Efficient Upsert Operations with SQL MERGE

Upsert operations (update if exists, insert if not) are a common requirement in database management. While many SQL dialects offer solutions like INSERT ... ON CONFLICT, the MERGE statement provides a standardized and often more efficient way to perform these operations. This challenge asks you to design a solution using the MERGE statement to synchronize data between a staging table and a target table.

Problem Description

You are tasked with creating a SQL script that uses the MERGE statement to synchronize product data between a staging table (staging_products) and a target table (products). The products table holds the master product information, while the staging_products table contains updates and new product records. The synchronization should:

  1. Update existing products: If a product in staging_products has the same product_id as a product in products, update the products table with the values from staging_products. Prioritize values from the staging table.
  2. Insert new products: If a product in staging_products does not have a matching product_id in products, insert the product into the products table.
  3. Handle NULL values: If a column in staging_products is NULL, it should not overwrite a non-NULL value in products. NULL values should only be used for updates when the corresponding value in products is also NULL.

Key Requirements:

  • Use the MERGE statement.
  • The products table has the following columns: product_id (INT, PRIMARY KEY), product_name (VARCHAR), price (DECIMAL), description (TEXT).
  • The staging_products table has the same columns as products.
  • product_id is the key used to match records between the two tables.

Expected Behavior:

The script should efficiently update and insert product records, ensuring data consistency between the staging and target tables. The products table should reflect the latest data from staging_products, respecting the NULL value handling rule.

Edge Cases to Consider:

  • Empty staging_products table: The script should not modify the products table.
  • staging_products contains duplicate product_id values: The behavior in this case is undefined, but the script should not error out. The last value encountered for a given product_id in the staging table should be used.
  • product_id is NULL in either table: The MERGE statement should handle NULL product_id values gracefully (typically, they won't match).

Examples

Example 1:

products table (before):
product_id | product_name | price | description
-----------|--------------|-------|-------------
1          | Widget A     | 10.00 | A simple widget
2          | Widget B     | 20.00 | A more complex widget

staging_products table:
product_id | product_name | price | description
-----------|--------------|-------|-------------
1          | Widget A     | 12.00 | Updated widget description
3          | Widget C     | 30.00 | A brand new widget

Output (products table after):
product_id | product_name | price | description
-----------|--------------|-------|-------------
1          | Widget A     | 12.00 | Updated widget description
2          | Widget B     | 20.00 | A more complex widget
3          | Widget C     | 30.00 | A brand new widget

Explanation: Product 1 is updated with the new price and description. Product 3 is inserted. Product 2 remains unchanged.

Example 2:

products table (before):
product_id | product_name | price | description
-----------|--------------|-------|-------------
1          | Widget A     | 10.00 | A simple widget
2          | Widget B     | 20.00 | A more complex widget

staging_products table:
product_id | product_name | price | description
-----------|--------------|-------|-------------
1          | Widget A     | NULL  | NULL
2          | Widget B     | 25.00 | NULL

Output (products table after):
product_id | product_name | price | description
-----------|--------------|-------|-------------
1          | Widget A     | 10.00 | A simple widget
2          | Widget B     | 25.00 | A more complex widget

Explanation: Product 1's price and description are not updated because the staging table has NULL values. Product 2's price is updated to 25.00, but the description remains unchanged.

Constraints

  • The solution must be compatible with standard SQL MERGE syntax. Specific database implementations (e.g., SQL Server, Oracle, PostgreSQL) are allowed, but the core logic should be transferable.
  • The script should be efficient, minimizing unnecessary operations.
  • The script should handle potential errors gracefully (e.g., invalid data types). While explicit error handling isn't required, the script shouldn't crash due to common data issues.
  • The products table already exists and is populated with initial data.
  • The staging_products table is cleared before each synchronization.

Notes

  • Consider the order of operations within the MERGE statement.
  • Pay close attention to the WHEN MATCHED and WHEN NOT MATCHED clauses.
  • The NULL value handling is a critical aspect of this challenge. Ensure your solution correctly preserves existing non-NULL values.
  • While performance optimization is desirable, focus on correctness and clarity first. Pseudocode is acceptable. Assume reasonable indexing on product_id in both tables.
  • The goal is to demonstrate understanding of the MERGE statement and its capabilities for upsert operations.
Loading editor...
plaintext