Hone logo
Hone
Problems

Analyzing Time Series Data with LAG and LEAD

Time series data, like stock prices, sensor readings, or website traffic, often requires analyzing trends and relationships between consecutive data points. SQL's LAG and LEAD functions provide powerful tools for accessing previous and subsequent rows within a result set, enabling calculations and comparisons that would otherwise be difficult or require complex self-joins. This challenge will test your ability to use these functions to identify patterns and derive insights from time-stamped data.

Problem Description

You are given a table named sensor_readings containing sensor data. The table has the following columns:

  • timestamp: A timestamp representing when the reading was taken (DATETIME).
  • sensor_id: An integer identifying the sensor (INT).
  • value: A floating-point number representing the sensor reading (FLOAT).

Your task is to write a SQL query that calculates the difference between each sensor reading and the previous reading for a specific sensor. Additionally, calculate the difference between each sensor reading and the next reading for the same sensor. The query should return a table with the following columns:

  • timestamp: The timestamp of the reading.
  • sensor_id: The sensor ID.
  • value: The sensor reading value.
  • previous_difference: The difference between the current reading and the previous reading. Should be NULL for the first reading of each sensor.
  • next_difference: The difference between the current reading and the next reading. Should be NULL for the last reading of each sensor.

Key Requirements:

  • The query must use the LAG and LEAD functions.
  • The query must correctly handle the first and last readings for each sensor, where previous_difference or next_difference will be NULL.
  • The query should be efficient and performant.

Expected Behavior:

The query should return a result set where each row represents a sensor reading, along with the calculated differences. The previous_difference and next_difference columns should accurately reflect the differences between consecutive readings for the specified sensor.

Edge Cases to Consider:

  • Empty sensor_readings table. The query should return an empty result set.
  • A single reading for a sensor. Both previous_difference and next_difference should be NULL.
  • Multiple readings for the same sensor with identical timestamps. The behavior in this case is not explicitly defined, but the query should not error.

Examples

Example 1:

Input:
sensor_readings table:
timestamp             | sensor_id | value
----------------------|-----------|-------
2023-10-26 10:00:00 | 1         | 10.0
2023-10-26 10:01:00 | 1         | 11.0
2023-10-26 10:02:00 | 1         | 12.0
2023-10-26 10:00:00 | 2         | 5.0
2023-10-26 10:01:00 | 2         | 6.0

Output:
timestamp             | sensor_id | value | previous_difference | next_difference
----------------------|-----------|-------|---------------------|-----------------
2023-10-26 10:00:00 | 1         | 10.0  | NULL                | 1.0
2023-10-26 10:01:00 | 1         | 11.0  | 1.0                 | 1.0
2023-10-26 10:02:00 | 1         | 12.0  | 1.0                 | NULL
2023-10-26 10:00:00 | 2         | 5.0   | NULL                | 1.0
2023-10-26 10:01:00 | 2         | 6.0   | 1.0                 | NULL

Explanation: For sensor 1, the difference between 11.0 and 10.0 is 1.0, and the difference between 12.0 and 11.0 is 1.0. The first reading has no previous difference, and the last reading has no next difference. The same logic applies to sensor 2.

Constraints

  • The sensor_readings table will contain at least 0 rows.
  • timestamp will always be a valid DATETIME value.
  • sensor_id will always be a positive integer.
  • value will always be a floating-point number.
  • The query should execute within 5 seconds on a table with up to 10,000 rows.

Notes

  • Consider using the PARTITION BY clause in the LAG and LEAD functions to ensure that the differences are calculated correctly for each sensor independently.
  • The ORDER BY clause within the LAG and LEAD functions is crucial for defining the order of rows when calculating the differences. Use the timestamp column for ordering.
  • Think about how to handle NULL values appropriately when calculating the differences. The problem statement specifies that the differences should be NULL for the first and last readings.
  • While not strictly required, consider using aliases to make the query more readable.
  • The specific SQL dialect (e.g., MySQL, PostgreSQL, SQL Server) is not specified, so the query should be as standard SQL as possible. Pseudocode:
// Assume a table named sensor_readings with columns timestamp, sensor_id, value

SELECT
    timestamp,
    sensor_id,
    value,
    LAG(value, 1, NULL) OVER (PARTITION BY sensor_id ORDER BY timestamp) AS previous_value,
    LEAD(value, 1, NULL) OVER (PARTITION BY sensor_id ORDER BY timestamp) AS next_value
FROM
    sensor_readings
ORDER BY
    sensor_id, timestamp;

// Calculate the differences
SELECT
    timestamp,
    sensor_id,
    value,
    value - previous_value AS previous_difference,
    next_value - value AS next_difference
FROM (
    SELECT
        timestamp,
        sensor_id,
        value,
        LAG(value, 1, NULL) OVER (PARTITION BY sensor_id ORDER BY timestamp) AS previous_value,
        LEAD(value, 1, NULL) OVER (PARTITION BY sensor_id ORDER BY timestamp) AS next_value
    FROM
        sensor_readings
) AS subquery
ORDER BY
    sensor_id, timestamp;
Loading editor...
plaintext