First and Last Stand: Tracking Product Lifecycles
Imagine you're working with a dataset of product sales over time. To understand product performance, you need to know the very first and last recorded sale for each product. This information can be crucial for determining a product's initial adoption and its eventual market presence or discontinuation. This challenge will test your ability to use SQL window functions to extract this valuable lifecycle data.
Problem Description
Your task is to write an SQL query that, for each unique product, returns:
- The
product_id. - The
sale_dateof the first sale recorded for that product. - The
sale_dateof the last sale recorded for that product.
You should leverage the FIRST_VALUE and LAST_VALUE window functions to achieve this. The results should be ordered by product_id for consistent output.
Key Requirements:
- Identify the earliest and latest sale date per
product_id. - Use
FIRST_VALUEandLAST_VALUEwindow functions. - The query should return a result set with three columns:
product_id,first_sale_date, andlast_sale_date.
Expected Behavior:
For each distinct product_id in the input table, the query should produce a single row containing that product_id along with its absolute first and last sale dates.
Edge Cases:
- Products with only a single sale should have the same date for both
first_sale_dateandlast_sale_date. - The input data may not be pre-sorted by date.
Examples
Example 1:
Input Table: Sales
| sale_id | product_id | sale_date |
|---|---|---|
| 1 | 101 | 2023-01-15 |
| 2 | 102 | 2023-02-20 |
| 3 | 101 | 2023-01-20 |
| 4 | 103 | 2023-03-01 |
| 5 | 101 | 2023-01-10 |
| 6 | 102 | 2023-02-25 |
Output:
| product_id | first_sale_date | last_sale_date |
|---|---|---|
| 101 | 2023-01-10 | 2023-01-20 |
| 102 | 2023-02-20 | 2023-02-25 |
| 103 | 2023-03-01 | 2023-03-01 |
Explanation:
For product_id 101, the earliest sale date is 2023-01-10 and the latest is 2023-01-20.
For product_id 102, the earliest sale date is 2023-02-20 and the latest is 2023-02-25.
For product_id 103, there's only one sale on 2023-03-01, so both first and last sale dates are the same.
Example 2:
Input Table: Sales
| sale_id | product_id | sale_date |
|---|---|---|
| 1 | 201 | 2022-11-01 |
| 2 | 202 | 2023-01-05 |
| 3 | 201 | 2022-10-15 |
Output:
| product_id | first_sale_date | last_sale_date |
|---|---|---|
| 201 | 2022-10-15 | 2022-11-01 |
| 202 | 2023-01-05 | 2023-01-05 |
Explanation:
For product_id 201, the earliest sale date is 2022-10-15 and the latest is 2022-11-01.
For product_id 202, there's only one sale on 2023-01-05.
Constraints
- The
Salestable will contain at least one row. product_idwill be an integer.sale_datewill be a date type.- The
Salestable will have at least oneproduct_idwith multiple entries. - The query should execute efficiently and not time out on reasonably sized datasets (e.g., up to 1 million rows).
Notes
- Consider the
PARTITION BYclause within your window function definitions to correctly group sales by product. - The
ORDER BYclause within the window function'sOVER()clause is critical for determining what constitutes the "first" and "last" value. - You will need to perform a
GROUP BYonproduct_idin your final selection, or use a subquery/CTE to aggregate the results after applying the window functions.