Hone logo
Hone
Problems

Simple Website Traffic Analyzer

This challenge focuses on implementing basic analytics for a simplified website. You'll be provided with a list of website visit logs, and your task is to write a Python script that analyzes these logs to calculate key metrics like total visits, unique visitors, and the most popular pages. This is a fundamental problem in data analysis and a stepping stone to more complex web analytics implementations.

Problem Description

You are given a list of strings, where each string represents a website visit log entry. Each log entry follows the format: "IP_Address,Page_URL". Your task is to write a Python function analyze_traffic(logs) that takes this list of logs as input and returns a dictionary containing the following analytics:

  • total_visits: The total number of visits recorded in the logs.
  • unique_visitors: The number of unique IP addresses that visited the website.
  • popular_pages: A dictionary where keys are page URLs and values are the number of times each page was visited. The dictionary should be sorted in descending order of visit count.

The function should handle potential errors gracefully, such as malformed log entries. Malformed entries should be ignored without crashing the program.

Examples

Example 1:

Input: ["192.168.1.1,/home", "192.168.1.2,/about", "192.168.1.1,/home", "192.168.1.3,/contact"]
Output: {'total_visits': 4, 'unique_visitors': 3, 'popular_pages': {'/home': 2, '/about': 1, '/contact': 1}}
Explanation: There are 4 total visits from 3 unique IP addresses. The /home page was visited twice, while /about and /contact were visited once each. The `popular_pages` dictionary is sorted by visit count.

Example 2:

Input: ["10.0.0.1,/products", "10.0.0.2,/services", "10.0.0.1,/products", "10.0.0.3,/blog", "invalid_log_entry", "10.0.0.2,/services"]
Output: {'total_visits': 4, 'unique_visitors': 3, 'popular_pages': {'/products': 2, '/services': 2, '/blog': 1}}
Explanation: The "invalid_log_entry" is ignored.  /products and /services were each visited twice, and /blog was visited once.

Example 3: (Edge Case - Empty Input)

Input: []
Output: {'total_visits': 0, 'unique_visitors': 0, 'popular_pages': {}}
Explanation:  If the input list is empty, all metrics should be zero or empty.

Constraints

  • The input logs will be a list of strings.
  • Each string in logs is expected to be in the format "IP_Address,Page_URL".
  • IP addresses are assumed to be valid IPv4 addresses (though validation is not required).
  • Page URLs are strings.
  • The length of the logs list can vary from 0 to 1000.
  • The popular_pages dictionary must be sorted in descending order of visit count. Use Python's built-in sorting capabilities for this.

Notes

  • Consider using a try-except block to handle potential errors when parsing log entries.
  • A set can be used efficiently to track unique visitors.
  • A dictionary is suitable for counting page visits.
  • Remember to sort the popular_pages dictionary by value (visit count) in descending order. You can use sorted() with a lambda function for this.
  • Focus on clarity and readability in your code. Good variable names and comments are encouraged.
Loading editor...
python