Analyzing Time Series Data with LAG and LEAD
Time series data, like stock prices, sensor readings, or website traffic, often requires analyzing trends and relationships between consecutive data points. SQL's LAG and LEAD functions provide powerful tools for accessing previous and subsequent rows within a result set, enabling calculations and comparisons that would otherwise be difficult or require complex self-joins. This challenge will test your ability to use these functions to identify patterns and derive insights from time-stamped data.
Problem Description
You are given a table named sensor_readings containing sensor data. The table has the following columns:
timestamp: A timestamp representing when the reading was taken (DATETIME).sensor_id: An integer identifying the sensor (INT).value: A floating-point number representing the sensor reading (FLOAT).
Your task is to write a SQL query that calculates the difference between each sensor reading and the previous reading for a specific sensor. Additionally, calculate the difference between each sensor reading and the next reading for the same sensor. The query should return a table with the following columns:
timestamp: The timestamp of the reading.sensor_id: The sensor ID.value: The sensor reading value.previous_difference: The difference between the current reading and the previous reading. Should be NULL for the first reading of each sensor.next_difference: The difference between the current reading and the next reading. Should be NULL for the last reading of each sensor.
Key Requirements:
- The query must use the
LAGandLEADfunctions. - The query must correctly handle the first and last readings for each sensor, where
previous_differenceornext_differencewill be NULL. - The query should be efficient and performant.
Expected Behavior:
The query should return a result set where each row represents a sensor reading, along with the calculated differences. The previous_difference and next_difference columns should accurately reflect the differences between consecutive readings for the specified sensor.
Edge Cases to Consider:
- Empty
sensor_readingstable. The query should return an empty result set. - A single reading for a sensor. Both
previous_differenceandnext_differenceshould be NULL. - Multiple readings for the same sensor with identical timestamps. The behavior in this case is not explicitly defined, but the query should not error.
Examples
Example 1:
Input:
sensor_readings table:
timestamp | sensor_id | value
----------------------|-----------|-------
2023-10-26 10:00:00 | 1 | 10.0
2023-10-26 10:01:00 | 1 | 11.0
2023-10-26 10:02:00 | 1 | 12.0
2023-10-26 10:00:00 | 2 | 5.0
2023-10-26 10:01:00 | 2 | 6.0
Output:
timestamp | sensor_id | value | previous_difference | next_difference
----------------------|-----------|-------|---------------------|-----------------
2023-10-26 10:00:00 | 1 | 10.0 | NULL | 1.0
2023-10-26 10:01:00 | 1 | 11.0 | 1.0 | 1.0
2023-10-26 10:02:00 | 1 | 12.0 | 1.0 | NULL
2023-10-26 10:00:00 | 2 | 5.0 | NULL | 1.0
2023-10-26 10:01:00 | 2 | 6.0 | 1.0 | NULL
Explanation: For sensor 1, the difference between 11.0 and 10.0 is 1.0, and the difference between 12.0 and 11.0 is 1.0. The first reading has no previous difference, and the last reading has no next difference. The same logic applies to sensor 2.
Constraints
- The
sensor_readingstable will contain at least 0 rows. timestampwill always be a valid DATETIME value.sensor_idwill always be a positive integer.valuewill always be a floating-point number.- The query should execute within 5 seconds on a table with up to 10,000 rows.
Notes
- Consider using the
PARTITION BYclause in theLAGandLEADfunctions to ensure that the differences are calculated correctly for each sensor independently. - The
ORDER BYclause within theLAGandLEADfunctions is crucial for defining the order of rows when calculating the differences. Use thetimestampcolumn for ordering. - Think about how to handle NULL values appropriately when calculating the differences. The problem statement specifies that the differences should be NULL for the first and last readings.
- While not strictly required, consider using aliases to make the query more readable.
- The specific SQL dialect (e.g., MySQL, PostgreSQL, SQL Server) is not specified, so the query should be as standard SQL as possible. Pseudocode:
// Assume a table named sensor_readings with columns timestamp, sensor_id, value
SELECT
timestamp,
sensor_id,
value,
LAG(value, 1, NULL) OVER (PARTITION BY sensor_id ORDER BY timestamp) AS previous_value,
LEAD(value, 1, NULL) OVER (PARTITION BY sensor_id ORDER BY timestamp) AS next_value
FROM
sensor_readings
ORDER BY
sensor_id, timestamp;
// Calculate the differences
SELECT
timestamp,
sensor_id,
value,
value - previous_value AS previous_difference,
next_value - value AS next_difference
FROM (
SELECT
timestamp,
sensor_id,
value,
LAG(value, 1, NULL) OVER (PARTITION BY sensor_id ORDER BY timestamp) AS previous_value,
LEAD(value, 1, NULL) OVER (PARTITION BY sensor_id ORDER BY timestamp) AS next_value
FROM
sensor_readings
) AS subquery
ORDER BY
sensor_id, timestamp;