Hone logo
Hone
Problems

Time-Series Aggregation and Filtering

This challenge focuses on querying and analyzing data that changes over time. You will be tasked with aggregating time-series data to derive meaningful insights and filtering this data based on specific time windows. This is a fundamental skill for analyzing trends, monitoring performance, and understanding historical patterns in various applications, from financial markets to IoT device monitoring.

Problem Description

You are given a SensorReadings table that stores data points recorded by various sensors at specific timestamps. Each row in the table represents a single reading and contains the sensor ID, the timestamp of the reading, and the measured value.

Your task is to implement SQL queries to answer common time-series analysis questions. Specifically, you need to:

  1. Calculate the daily average value for each sensor. This involves grouping readings by sensor and by day, and then computing the average value for each group.
  2. Identify sensors that have had a value exceeding a certain threshold within a specific time range. This requires filtering readings by timestamp and then checking for values above the threshold.

Key Requirements:

  • The output for the daily average should be ordered by sensor ID and then by date.
  • The output for the threshold breach should include the sensor ID, the timestamp of the breach, and the value that exceeded the threshold.
  • Assume the timestamps are stored in a standard datetime format.

Edge Cases to Consider:

  • Sensors with no readings on a particular day should not appear in the daily average results for that day.
  • Time ranges that do not contain any readings should produce an empty result set for threshold breaches.

Examples

Example 1: Daily Average Calculation

Input:

SensorReadings table:

sensor_idtimestampvalue
'A'2023-10-26 08:00:0015.2
'B'2023-10-26 08:15:0022.5
'A'2023-10-26 09:30:0016.0
'A'2023-10-27 10:00:0017.1
'B'2023-10-27 11:00:0023.0
'A'2023-10-27 12:00:0017.5
'B'2023-10-26 10:00:0022.8

Output:

sensor_iddateavg_value
'A'2023-10-2615.6
'A'2023-10-2717.3
'B'2023-10-2622.65
'B'2023-10-2723.0

Explanation: For sensor 'A' on 2023-10-26, the readings are 15.2 and 16.0. The average is (15.2 + 16.0) / 2 = 15.6. For sensor 'B' on 2023-10-26, the readings are 22.5 and 22.8. The average is (22.5 + 22.8) / 2 = 22.65. And so on for other combinations.

Example 2: Threshold Breach Detection

Input:

SensorReadings table (same as Example 1) Threshold: 23.0 Start Time: 2023-10-26 00:00:00 End Time: 2023-10-26 23:59:59

Output:

sensor_idtimestampvalue
'B'2023-10-26 08:15:0022.5
'B'2023-10-26 10:00:0022.8

Corrected Output for Example 2:

sensor_idtimestampvalue
'B'2023-10-26 08:15:0022.5
'B'2023-10-26 10:00:0022.8

Let's adjust the input to make the output meaningful for the threshold breach.

Corrected Input for Example 2:

SensorReadings table:

sensor_idtimestampvalue
'A'2023-10-26 08:00:0015.2
'B'2023-10-26 08:15:0023.5
'A'2023-10-26 09:30:0016.0
'A'2023-10-27 10:00:0017.1
'B'2023-10-27 11:00:0023.0
'A'2023-10-27 12:00:0017.5
'B'2023-10-26 10:00:0022.8

Threshold: 23.0 Start Time: 2023-10-26 00:00:00 End Time: 2023-10-26 23:59:59

Corrected Output for Example 2:

sensor_idtimestampvalue
'B'2023-10-26 08:15:0023.5

Explanation: We filter readings between '2023-10-26 00:00:00' and '2023-10-26 23:59:59'. Then, we select rows where the value is greater than 23.0. Only the reading for sensor 'B' at '2023-10-26 08:15:00' with a value of 23.5 meets these criteria.

Example 3: No Readings in Time Range

Input:

SensorReadings table (same as Example 1) Threshold: 30.0 Start Time: 2023-11-01 00:00:00 End Time: 2023-11-01 23:59:59

Output:

An empty table.

Explanation: There are no sensor readings within the specified time range '2023-11-01 00:00:00' to '2023-11-01 23:59:59', so no rows are returned.

Constraints

  • The SensorReadings table can contain up to 1,000,000 rows.
  • Timestamps are stored as DATETIME or a compatible type.
  • Sensor IDs are strings.
  • Values are floating-point numbers.
  • Queries should aim for reasonable performance on the given dataset size.

Notes

  • You will need to use SQL functions for date extraction and aggregation. The specific functions might vary slightly depending on your SQL dialect (e.g., DATE(), EXTRACT(), STRFTIME() for dates; AVG() for averages).
  • For the time-based filtering, you'll likely use BETWEEN or comparison operators (>=, <=).
  • Consider how to handle potential NULL values in the value column if they were possible (though for this challenge, assume they are not).
  • For the daily average, you'll need to extract the date part from the timestamp.
Loading editor...
plaintext