Hone logo
Hone
Problems

Python Log Aggregator

This challenge focuses on building a robust log aggregation system in Python. In real-world applications, multiple services often generate logs independently. A centralized log aggregation system is crucial for monitoring, debugging, and analyzing application behavior by bringing all these logs together in a structured and searchable format.

Problem Description

Your task is to implement a Python-based log aggregator that can receive log entries from various sources, process them, and store them in a structured way. The aggregator should be able to handle different log levels and timestamps, and be extensible to support various output formats.

Key Requirements:

  1. Log Reception: The aggregator should have a mechanism to receive log entries. For this challenge, you can simulate receiving logs via a function call or by reading from a predefined input source.
  2. Log Parsing: Each log entry should be parsed to extract key information such as timestamp, log level, source (e.g., service name), and the actual log message. Assume log entries are strings in a specific format.
  3. Structured Storage: The parsed log entries should be stored in a structured format (e.g., a list of dictionaries or custom objects).
  4. Filtering and Searching: The aggregator should support basic filtering capabilities, such as filtering logs by log level or by keywords in the message.
  5. Outputting Logs: The aggregator should be able to output the aggregated logs in a readable format (e.g., a formatted string or a list of structured log entries).
  6. Extensibility: Design the system such that adding new output formats or parsing methods for different log formats is straightforward.

Expected Behavior:

The aggregator should take raw log strings as input, parse them into structured data, and store them. When requested, it should be able to retrieve and present these logs, potentially filtered according to specified criteria.

Edge Cases to Consider:

  • Malformed log entries (e.g., missing fields, incorrect timestamp format).
  • Empty log messages.
  • Concurrent log generation (though for this challenge, a single-threaded approach is acceptable for simplicity).

Examples

Example 1:

Input Log Strings:

[2023-10-27 10:00:00] INFO: User 'alice' logged in from IP 192.168.1.10
[2023-10-27 10:01:30] WARNING: Disk space low on server 'web-01'
[2023-10-27 10:02:15] ERROR: Database connection failed for user 'admin'
[2023-10-27 10:03:00] INFO: User 'bob' accessed resource '/dashboard'

Expected Structured Logs (Conceptual - actual implementation might use dictionaries):

[
    {'timestamp': '2023-10-27 10:00:00', 'level': 'INFO', 'source': None, 'message': "User 'alice' logged in from IP 192.168.1.10"},
    {'timestamp': '2023-10-27 10:01:30', 'level': 'WARNING', 'source': None, 'message': "Disk space low on server 'web-01'"},
    {'timestamp': '2023-10-27 10:02:15', 'level': 'ERROR', 'source': None, 'message': "Database connection failed for user 'admin'"},
    {'timestamp': '2023-10-27 10:03:00', 'level': 'INFO', 'source': None, 'message': "User 'bob' accessed resource '/dashboard'"}
]

(Note: For simplicity in this example, 'source' is None. A more advanced parser might infer it.)

Example 2:

Filtering for 'ERROR' logs from the input in Example 1.

Input Log Strings:

[2023-10-27 10:00:00] INFO: User 'alice' logged in from IP 192.168.1.10
[2023-10-27 10:01:30] WARNING: Disk space low on server 'web-01'
[2023-10-27 10:02:15] ERROR: Database connection failed for user 'admin'
[2023-10-27 10:03:00] INFO: User 'bob' accessed resource '/dashboard'

Expected Filtered Output (formatted string):

[2023-10-27 10:02:15] ERROR: Database connection failed for user 'admin'

Example 3: (Malformed Log Entry)

Input Log Strings:

[2023-10-27 10:00:00] INFO: Valid log entry
This is a malformed line without a timestamp or level
[2023-10-27 10:05:00] DEBUG: Another valid entry

Expected Behavior: The aggregator should gracefully handle the malformed line, perhaps by logging an error itself about the parsing failure or by skipping the line, and still process valid entries.

Constraints

  • The timestamp format is assumed to be YYYY-MM-DD HH:MM:SS.
  • Log levels will be standard strings like INFO, WARNING, ERROR, DEBUG.
  • The maximum number of log entries to process in a single run is 1000.
  • The log message itself can contain colons and other special characters.

Notes

  • Consider using Python's built-in datetime module for timestamp manipulation if needed, though for this challenge, string comparisons might suffice for basic filtering.
  • Think about how you would separate the parsing logic from the aggregation and storage logic. This will help with extensibility.
  • A class-based approach would be suitable for encapsulating the aggregator's state and behavior.
  • For simulating different log sources, you could add a source field to your structured log data.
Loading editor...
python