Hone logo
Hone
Problems

Analyzing Sequential Data with LAG and LEAD

This challenge focuses on utilizing the LAG and LEAD window functions in SQL. These functions are invaluable for comparing a row with a preceding or succeeding row within a defined partition and order. You'll be tasked with analyzing time-series data to identify trends and patterns by looking at values from adjacent records.

Problem Description

You are provided with a table containing daily sales data for various products. Your goal is to determine the daily sales change for each product compared to its previous day's sales and the projected sales for the next day based on its current day's sales.

What needs to be achieved:

  1. Calculate the difference in sales between the current day and the previous day for each product.
  2. Calculate the difference in sales between the next day and the current day for each product.

Key Requirements:

  • The results should be grouped by product_id.
  • The ordering within each product's group should be by sale_date.
  • For the first day of sales for a product, the "previous day's sales" should be considered 0.
  • For the last day of sales for a product, the "next day's sales" should be considered 0.

Expected Behavior:

For each row in the input table, you should output the original row's data, plus two new columns:

  • sales_vs_previous_day: The current day's sales minus the previous day's sales for the same product.
  • sales_vs_next_day: The next day's sales minus the current day's sales for the same product.

Edge Cases to Consider:

  • Products with only one day of sales data.
  • Gaps in sale dates for a product (though for this challenge, assume contiguous daily sales data for simplicity unless otherwise specified by constraints).

Examples

Example 1:

Input Table: daily_sales

sale_dateproduct_idsales
2023-01-01101100
2023-01-02101120
2023-01-03101110
2023-01-0110250
2023-01-0210260

Output:

sale_dateproduct_idsalessales_vs_previous_daysales_vs_next_day
2023-01-0110110010020
2023-01-0210112020-10
2023-01-03101110-10-110
2023-01-01102505010
2023-01-021026010-60

Explanation:

  • For product_id 101 on 2023-01-01: sales_vs_previous_day is 100 - 0 = 100. sales_vs_next_day is 120 - 100 = 20.
  • For product_id 101 on 2023-01-02: sales_vs_previous_day is 120 - 100 = 20. sales_vs_next_day is 110 - 120 = -10.
  • For product_id 101 on 2023-01-03: sales_vs_previous_day is 110 - 120 = -10. sales_vs_next_day is 0 - 110 = -110 (assuming no sales on 2023-01-04).
  • Similar calculations apply for product_id 102.

Example 2:

Input Table: daily_sales

sale_dateproduct_idsales
2023-02-1020575

Output:

sale_dateproduct_idsalessales_vs_previous_daysales_vs_next_day
2023-02-102057575-75

Explanation:

  • This product only has one day of sales. The previous day's sales are treated as 0, resulting in 75 - 0 = 75. The next day's sales are treated as 0, resulting in 0 - 75 = -75.

Constraints

  • The daily_sales table will contain at least one row.
  • sale_date will be a valid date format.
  • product_id will be an integer.
  • sales will be a non-negative integer.
  • The total number of rows in daily_sales will not exceed 1,000,000.
  • Your query should be efficient enough to run within a reasonable time on the given dataset size.

Notes

  • The LAG and LEAD functions require an OVER clause.
  • The OVER clause must include a PARTITION BY to segment data by product_id and an ORDER BY to define the sequence within each partition (by sale_date).
  • Pay close attention to the optional default argument in LAG and LEAD to handle the first and last rows of each partition as specified in the problem description.
Loading editor...
plaintext