Analyzing Sequential Data with LAG and LEAD
This challenge focuses on utilizing the LAG and LEAD window functions in SQL. These functions are invaluable for comparing a row with a preceding or succeeding row within a defined partition and order. You'll be tasked with analyzing time-series data to identify trends and patterns by looking at values from adjacent records.
Problem Description
You are provided with a table containing daily sales data for various products. Your goal is to determine the daily sales change for each product compared to its previous day's sales and the projected sales for the next day based on its current day's sales.
What needs to be achieved:
- Calculate the difference in sales between the current day and the previous day for each product.
- Calculate the difference in sales between the next day and the current day for each product.
Key Requirements:
- The results should be grouped by
product_id. - The ordering within each product's group should be by
sale_date. - For the first day of sales for a product, the "previous day's sales" should be considered 0.
- For the last day of sales for a product, the "next day's sales" should be considered 0.
Expected Behavior:
For each row in the input table, you should output the original row's data, plus two new columns:
sales_vs_previous_day: The current day's sales minus the previous day's sales for the same product.sales_vs_next_day: The next day's sales minus the current day's sales for the same product.
Edge Cases to Consider:
- Products with only one day of sales data.
- Gaps in sale dates for a product (though for this challenge, assume contiguous daily sales data for simplicity unless otherwise specified by constraints).
Examples
Example 1:
Input Table: daily_sales
| sale_date | product_id | sales |
|---|---|---|
| 2023-01-01 | 101 | 100 |
| 2023-01-02 | 101 | 120 |
| 2023-01-03 | 101 | 110 |
| 2023-01-01 | 102 | 50 |
| 2023-01-02 | 102 | 60 |
Output:
| sale_date | product_id | sales | sales_vs_previous_day | sales_vs_next_day |
|---|---|---|---|---|
| 2023-01-01 | 101 | 100 | 100 | 20 |
| 2023-01-02 | 101 | 120 | 20 | -10 |
| 2023-01-03 | 101 | 110 | -10 | -110 |
| 2023-01-01 | 102 | 50 | 50 | 10 |
| 2023-01-02 | 102 | 60 | 10 | -60 |
Explanation:
- For
product_id101 on 2023-01-01:sales_vs_previous_dayis100 - 0 = 100.sales_vs_next_dayis120 - 100 = 20. - For
product_id101 on 2023-01-02:sales_vs_previous_dayis120 - 100 = 20.sales_vs_next_dayis110 - 120 = -10. - For
product_id101 on 2023-01-03:sales_vs_previous_dayis110 - 120 = -10.sales_vs_next_dayis0 - 110 = -110(assuming no sales on 2023-01-04). - Similar calculations apply for
product_id102.
Example 2:
Input Table: daily_sales
| sale_date | product_id | sales |
|---|---|---|
| 2023-02-10 | 205 | 75 |
Output:
| sale_date | product_id | sales | sales_vs_previous_day | sales_vs_next_day |
|---|---|---|---|---|
| 2023-02-10 | 205 | 75 | 75 | -75 |
Explanation:
- This product only has one day of sales. The previous day's sales are treated as 0, resulting in
75 - 0 = 75. The next day's sales are treated as 0, resulting in0 - 75 = -75.
Constraints
- The
daily_salestable will contain at least one row. sale_datewill be a valid date format.product_idwill be an integer.saleswill be a non-negative integer.- The total number of rows in
daily_saleswill not exceed 1,000,000. - Your query should be efficient enough to run within a reasonable time on the given dataset size.
Notes
- The
LAGandLEADfunctions require anOVERclause. - The
OVERclause must include aPARTITION BYto segment data byproduct_idand anORDER BYto define the sequence within each partition (bysale_date). - Pay close attention to the optional
defaultargument inLAGandLEADto handle the first and last rows of each partition as specified in the problem description.