Python Application Monitoring System

This challenge involves building a simplified monitoring system for a hypothetical Python application. You will create a mechanism to track key performance indicators (KPIs) of different application components and provide insights into their operational status. This is crucial for understanding application health, identifying bottlenecks, and ensuring a smooth user experience.

Problem Description

Your task is to develop a Python class, Monitor, that can record, aggregate, and report on various metrics from different parts of an application. The Monitor should be able to:

Record Events: Log the occurrence of specific events with associated timestamps and optional metadata.
Track Metrics: Maintain counts and durations for different types of operations or statuses.
Aggregate Data: Provide methods to retrieve summarized information about recorded events and tracked metrics.
Report Status: Generate a summary report indicating the health or status of monitored components based on predefined rules or thresholds.

Key Requirements:

Event Logging: Implement a method to log events. Each event should have a timestamp, an event_type (e.g., "request_processed", "error_occurred", "database_query"), and an optional metadata dictionary.
Metric Tracking: Implement methods to:
- Increment a counter for a given metric (e.g., increment_metric("successful_requests")).
- Record the duration of an operation for a given metric (e.g., record_duration("api_latency", 0.15)).
Data Aggregation:
- Retrieve all logged events within a specified time range.
- Get the current count for a given metric.
- Get the average, minimum, and maximum duration for a given metric.
Status Reporting: Implement a method to generate a status report. This report should include:
- Total number of events logged.
- Counts for specific critical event types (e.g., "error_occurred").
- Average API latency.
- A simple "status" (e.g., "OK", "WARNING", "ERROR") based on thresholds you define (e.g., if error count exceeds a certain number, status is "ERROR").

Expected Behavior:

The Monitor class should behave as a central hub for all monitoring data. When methods are called, data should be stored efficiently. When reporting methods are called, the aggregated and processed data should be returned in a clear and usable format.

Edge Cases:

Handling requests for metrics that have not yet been recorded.
Handling requests for event data when no events have been logged.
Ensuring timestamp accuracy.
Handling durations that are zero or negative (though typically durations should be positive).

Examples

Example 1:

from datetime import datetime, timedelta

# Initialize the monitor
monitor = Monitor()

# Log some events
monitor.log_event("request_received", metadata={"user_id": "user123"})
monitor.log_event("database_query", metadata={"query_time": 0.05})
monitor.log_event("request_processed", metadata={"status_code": 200})
monitor.log_event("error_occurred", metadata={"error_type": "FileNotFound", "message": "config.json not found"})

# Track some metrics
monitor.increment_metric("total_requests")
monitor.increment_metric("total_requests")
monitor.record_duration("api_latency", 0.15)
monitor.record_duration("api_latency", 0.20)

# Get aggregated data
print(f"Total requests: {monitor.get_metric_count('total_requests')}")
print(f"Average API latency: {monitor.get_average_duration('api_latency'):.2f}s")
print(f"Events in the last minute: {len(monitor.get_events_in_range(datetime.now() - timedelta(minutes=1)))}")

# Generate a status report
print("\n--- Status Report ---")
report = monitor.generate_report()
for key, value in report.items():
    print(f"{key}: {value}")

Total requests: 2
Average API latency: 0.18s
Events in the last minute: 4

--- Status Report ---
total_events: 4
error_occurred: 1
api_latency_avg_s: 0.18
status: WARNING

Explanation: The monitor records four events. It tracks that "total_requests" was incremented twice and records two API latency durations. The average API latency is calculated. The status report shows all events, the count of critical "error_occurred" events, the average API latency, and a "WARNING" status because an error occurred.

Example 2:

from datetime import datetime, timedelta

monitor = Monitor()

# Simulate a period of errors
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})

# Add some normal activity
monitor.log_event("request_processed")

print("\n--- Status Report (High Errors) ---")
report = monitor.generate_report()
for key, value in report.items():
    print(f"{key}: {value}")

--- Status Report (High Errors) ---
total_events: 6
error_occurred: 5
api_latency_avg_s: N/A
status: ERROR

Explanation: In this scenario, many "error_occurred" events are logged. The status report reflects this with a high count for "error_occurred" and consequently sets the overall status to "ERROR". "api_latency_avg_s" is shown as "N/A" because no API latency was recorded.

Constraints

The Monitor class should be implemented in a single Python file.
Timestamps should be stored using Python's datetime objects.
Metric durations should be stored as floating-point numbers representing seconds.
The Monitor class should be thread-safe if multiple threads might access it concurrently (consider using locks).
The generate_report method should have predefined thresholds for determining the "status" field. For instance:
- If error_occurred count > 5, status is "ERROR".
- If error_occurred count > 2, status is "WARNING".
- Otherwise, status is "OK".
- If no api_latency data exists, display "N/A".

Notes

Consider using collections.defaultdict or collections.Counter for efficient metric tracking.
For event storage, a simple list or a more optimized data structure like a deque might be suitable, depending on how you plan to retrieve events by time range.
The generate_report method's status logic can be extended or made configurable. For this challenge, hardcoding the thresholds is acceptable.
Think about how to handle the case where no data exists for a particular metric (e.g., average duration when no durations have been recorded).
The metadata for events can be any dictionary. You don't need to pre-define its structure.