Hone logo
Hone
Problems

Python Application Monitoring System

This challenge involves building a simplified monitoring system for a hypothetical Python application. You will create a mechanism to track key performance indicators (KPIs) of different application components and provide insights into their operational status. This is crucial for understanding application health, identifying bottlenecks, and ensuring a smooth user experience.

Problem Description

Your task is to develop a Python class, Monitor, that can record, aggregate, and report on various metrics from different parts of an application. The Monitor should be able to:

  1. Record Events: Log the occurrence of specific events with associated timestamps and optional metadata.
  2. Track Metrics: Maintain counts and durations for different types of operations or statuses.
  3. Aggregate Data: Provide methods to retrieve summarized information about recorded events and tracked metrics.
  4. Report Status: Generate a summary report indicating the health or status of monitored components based on predefined rules or thresholds.

Key Requirements:

  • Event Logging: Implement a method to log events. Each event should have a timestamp, an event_type (e.g., "request_processed", "error_occurred", "database_query"), and an optional metadata dictionary.
  • Metric Tracking: Implement methods to:
    • Increment a counter for a given metric (e.g., increment_metric("successful_requests")).
    • Record the duration of an operation for a given metric (e.g., record_duration("api_latency", 0.15)).
  • Data Aggregation:
    • Retrieve all logged events within a specified time range.
    • Get the current count for a given metric.
    • Get the average, minimum, and maximum duration for a given metric.
  • Status Reporting: Implement a method to generate a status report. This report should include:
    • Total number of events logged.
    • Counts for specific critical event types (e.g., "error_occurred").
    • Average API latency.
    • A simple "status" (e.g., "OK", "WARNING", "ERROR") based on thresholds you define (e.g., if error count exceeds a certain number, status is "ERROR").

Expected Behavior:

The Monitor class should behave as a central hub for all monitoring data. When methods are called, data should be stored efficiently. When reporting methods are called, the aggregated and processed data should be returned in a clear and usable format.

Edge Cases:

  • Handling requests for metrics that have not yet been recorded.
  • Handling requests for event data when no events have been logged.
  • Ensuring timestamp accuracy.
  • Handling durations that are zero or negative (though typically durations should be positive).

Examples

Example 1:

from datetime import datetime, timedelta

# Initialize the monitor
monitor = Monitor()

# Log some events
monitor.log_event("request_received", metadata={"user_id": "user123"})
monitor.log_event("database_query", metadata={"query_time": 0.05})
monitor.log_event("request_processed", metadata={"status_code": 200})
monitor.log_event("error_occurred", metadata={"error_type": "FileNotFound", "message": "config.json not found"})

# Track some metrics
monitor.increment_metric("total_requests")
monitor.increment_metric("total_requests")
monitor.record_duration("api_latency", 0.15)
monitor.record_duration("api_latency", 0.20)

# Get aggregated data
print(f"Total requests: {monitor.get_metric_count('total_requests')}")
print(f"Average API latency: {monitor.get_average_duration('api_latency'):.2f}s")
print(f"Events in the last minute: {len(monitor.get_events_in_range(datetime.now() - timedelta(minutes=1)))}")

# Generate a status report
print("\n--- Status Report ---")
report = monitor.generate_report()
for key, value in report.items():
    print(f"{key}: {value}")
Total requests: 2
Average API latency: 0.18s
Events in the last minute: 4

--- Status Report ---
total_events: 4
error_occurred: 1
api_latency_avg_s: 0.18
status: WARNING

Explanation: The monitor records four events. It tracks that "total_requests" was incremented twice and records two API latency durations. The average API latency is calculated. The status report shows all events, the count of critical "error_occurred" events, the average API latency, and a "WARNING" status because an error occurred.

Example 2:

from datetime import datetime, timedelta

monitor = Monitor()

# Simulate a period of errors
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})
monitor.log_event("error_occurred", metadata={"error_type": "DatabaseConnectionError"})

# Add some normal activity
monitor.log_event("request_processed")

print("\n--- Status Report (High Errors) ---")
report = monitor.generate_report()
for key, value in report.items():
    print(f"{key}: {value}")
--- Status Report (High Errors) ---
total_events: 6
error_occurred: 5
api_latency_avg_s: N/A
status: ERROR

Explanation: In this scenario, many "error_occurred" events are logged. The status report reflects this with a high count for "error_occurred" and consequently sets the overall status to "ERROR". "api_latency_avg_s" is shown as "N/A" because no API latency was recorded.

Constraints

  • The Monitor class should be implemented in a single Python file.
  • Timestamps should be stored using Python's datetime objects.
  • Metric durations should be stored as floating-point numbers representing seconds.
  • The Monitor class should be thread-safe if multiple threads might access it concurrently (consider using locks).
  • The generate_report method should have predefined thresholds for determining the "status" field. For instance:
    • If error_occurred count > 5, status is "ERROR".
    • If error_occurred count > 2, status is "WARNING".
    • Otherwise, status is "OK".
    • If no api_latency data exists, display "N/A".

Notes

  • Consider using collections.defaultdict or collections.Counter for efficient metric tracking.
  • For event storage, a simple list or a more optimized data structure like a deque might be suitable, depending on how you plan to retrieve events by time range.
  • The generate_report method's status logic can be extended or made configurable. For this challenge, hardcoding the thresholds is acceptable.
  • Think about how to handle the case where no data exists for a particular metric (e.g., average duration when no durations have been recorded).
  • The metadata for events can be any dictionary. You don't need to pre-define its structure.
Loading editor...
python