Time-Series Aggregation and Filtering
This challenge focuses on querying and analyzing data that changes over time. You will be tasked with aggregating time-series data to derive meaningful insights and filtering this data based on specific time windows. This is a fundamental skill for analyzing trends, monitoring performance, and understanding historical patterns in various applications, from financial markets to IoT device monitoring.
Problem Description
You are given a SensorReadings table that stores data points recorded by various sensors at specific timestamps. Each row in the table represents a single reading and contains the sensor ID, the timestamp of the reading, and the measured value.
Your task is to implement SQL queries to answer common time-series analysis questions. Specifically, you need to:
- Calculate the daily average value for each sensor. This involves grouping readings by sensor and by day, and then computing the average value for each group.
- Identify sensors that have had a value exceeding a certain threshold within a specific time range. This requires filtering readings by timestamp and then checking for values above the threshold.
Key Requirements:
- The output for the daily average should be ordered by sensor ID and then by date.
- The output for the threshold breach should include the sensor ID, the timestamp of the breach, and the value that exceeded the threshold.
- Assume the timestamps are stored in a standard datetime format.
Edge Cases to Consider:
- Sensors with no readings on a particular day should not appear in the daily average results for that day.
- Time ranges that do not contain any readings should produce an empty result set for threshold breaches.
Examples
Example 1: Daily Average Calculation
Input:
SensorReadings table:
| sensor_id | timestamp | value |
|---|---|---|
| 'A' | 2023-10-26 08:00:00 | 15.2 |
| 'B' | 2023-10-26 08:15:00 | 22.5 |
| 'A' | 2023-10-26 09:30:00 | 16.0 |
| 'A' | 2023-10-27 10:00:00 | 17.1 |
| 'B' | 2023-10-27 11:00:00 | 23.0 |
| 'A' | 2023-10-27 12:00:00 | 17.5 |
| 'B' | 2023-10-26 10:00:00 | 22.8 |
Output:
| sensor_id | date | avg_value |
|---|---|---|
| 'A' | 2023-10-26 | 15.6 |
| 'A' | 2023-10-27 | 17.3 |
| 'B' | 2023-10-26 | 22.65 |
| 'B' | 2023-10-27 | 23.0 |
Explanation: For sensor 'A' on 2023-10-26, the readings are 15.2 and 16.0. The average is (15.2 + 16.0) / 2 = 15.6. For sensor 'B' on 2023-10-26, the readings are 22.5 and 22.8. The average is (22.5 + 22.8) / 2 = 22.65. And so on for other combinations.
Example 2: Threshold Breach Detection
Input:
SensorReadings table (same as Example 1)
Threshold: 23.0
Start Time: 2023-10-26 00:00:00
End Time: 2023-10-26 23:59:59
Output:
| sensor_id | timestamp | value |
|---|---|---|
| 'B' | 2023-10-26 08:15:00 | 22.5 |
| 'B' | 2023-10-26 10:00:00 | 22.8 |
Corrected Output for Example 2:
| sensor_id | timestamp | value |
|---|---|---|
| 'B' | 2023-10-26 08:15:00 | 22.5 |
| 'B' | 2023-10-26 10:00:00 | 22.8 |
Let's adjust the input to make the output meaningful for the threshold breach.
Corrected Input for Example 2:
SensorReadings table:
| sensor_id | timestamp | value |
|---|---|---|
| 'A' | 2023-10-26 08:00:00 | 15.2 |
| 'B' | 2023-10-26 08:15:00 | 23.5 |
| 'A' | 2023-10-26 09:30:00 | 16.0 |
| 'A' | 2023-10-27 10:00:00 | 17.1 |
| 'B' | 2023-10-27 11:00:00 | 23.0 |
| 'A' | 2023-10-27 12:00:00 | 17.5 |
| 'B' | 2023-10-26 10:00:00 | 22.8 |
Threshold: 23.0
Start Time: 2023-10-26 00:00:00
End Time: 2023-10-26 23:59:59
Corrected Output for Example 2:
| sensor_id | timestamp | value |
|---|---|---|
| 'B' | 2023-10-26 08:15:00 | 23.5 |
Explanation:
We filter readings between '2023-10-26 00:00:00' and '2023-10-26 23:59:59'.
Then, we select rows where the value is greater than 23.0.
Only the reading for sensor 'B' at '2023-10-26 08:15:00' with a value of 23.5 meets these criteria.
Example 3: No Readings in Time Range
Input:
SensorReadings table (same as Example 1)
Threshold: 30.0
Start Time: 2023-11-01 00:00:00
End Time: 2023-11-01 23:59:59
Output:
An empty table.
Explanation: There are no sensor readings within the specified time range '2023-11-01 00:00:00' to '2023-11-01 23:59:59', so no rows are returned.
Constraints
- The
SensorReadingstable can contain up to 1,000,000 rows. - Timestamps are stored as
DATETIMEor a compatible type. - Sensor IDs are strings.
- Values are floating-point numbers.
- Queries should aim for reasonable performance on the given dataset size.
Notes
- You will need to use SQL functions for date extraction and aggregation. The specific functions might vary slightly depending on your SQL dialect (e.g.,
DATE(),EXTRACT(),STRFTIME()for dates;AVG()for averages). - For the time-based filtering, you'll likely use
BETWEENor comparison operators (>=,<=). - Consider how to handle potential
NULLvalues in thevaluecolumn if they were possible (though for this challenge, assume they are not). - For the daily average, you'll need to extract the date part from the timestamp.