Hone logo
Hone
Problems

User Activity Analysis: 30-Day Summary

This challenge focuses on analyzing user activity data to generate a summary report for the past 30 days. Understanding user engagement patterns is crucial for product improvement, targeted marketing, and overall business strategy. You will be provided with a dataset of user activity logs and tasked with calculating key metrics like total active users, average daily activity, and most active day.

Problem Description

You are given a dataset representing user activity logs. Each log entry contains a user_id, a timestamp (representing when the activity occurred), and an activity_type (e.g., "login", "post", "comment"). Your task is to analyze this data and generate a summary report for the past 30 days (including today). The report should include the following:

  • Total Active Users: The number of unique users who performed any activity within the last 30 days.
  • Average Daily Activity: The average number of activity logs per day over the last 30 days.
  • Most Active Day: The date with the highest number of activity logs within the last 30 days.

Key Requirements:

  • The timestamp should be treated as a date (ignoring time).
  • The 30-day window should be calculated from today's date.
  • Handle cases where the input data is empty.
  • Handle cases where no activity occurred within the last 30 days.

Expected Behavior:

The function should take the activity log data as input and return a dictionary (or similar data structure) containing the calculated metrics. The dictionary should have the following keys: "total_active_users", "average_daily_activity", and "most_active_day".

Edge Cases to Consider:

  • Empty input data.
  • No activity within the last 30 days.
  • Data spanning multiple years (ensure only the last 30 days are considered).
  • Large datasets (consider efficiency).

Examples

Example 1:

Input: [
    {"user_id": 1, "timestamp": "2024-01-20", "activity_type": "login"},
    {"user_id": 2, "timestamp": "2024-01-21", "activity_type": "post"},
    {"user_id": 1, "timestamp": "2024-01-22", "activity_type": "comment"},
    {"user_id": 3, "timestamp": "2024-01-23", "activity_type": "login"},
    {"user_id": 2, "timestamp": "2024-01-24", "activity_type": "post"}
]
(Assuming today's date is 2024-01-25)
Output: {
    "total_active_users": 3,
    "average_daily_activity": 1.0,
    "most_active_day": "2024-01-25"
}
Explanation: 3 unique users were active. The average daily activity is 1 (5 activities / 5 days). Today (2024-01-25) is the most active day (as it's the current day).

Example 2:

Input: []
(Assuming today's date is 2024-01-25)
Output: {
    "total_active_users": 0,
    "average_daily_activity": 0.0,
    "most_active_day": "2024-01-25"
}
Explanation: No users were active.  Average daily activity is 0. Today is still the most active day.

Example 3: (Edge Case - No activity in the last 30 days)

Input: [
    {"user_id": 1, "timestamp": "2023-12-20", "activity_type": "login"},
    {"user_id": 2, "timestamp": "2023-12-21", "activity_type": "post"}
]
(Assuming today's date is 2024-01-25)
Output: {
    "total_active_users": 0,
    "average_daily_activity": 0.0,
    "most_active_day": "2024-01-25"
}
Explanation: No activity occurred within the last 30 days.

Constraints

  • Input Data Size: The input list can contain up to 10,000 activity log entries.
  • Timestamp Format: Timestamps are provided in "YYYY-MM-DD" format.
  • Performance: The solution should complete within 1 second for the given input size.
  • Date Range: The 30-day window is calculated relative to the current date.

Notes

  • Consider using appropriate data structures (e.g., sets for unique users, dictionaries for daily activity counts) to optimize performance.
  • You may need to parse the timestamp strings into date objects for easier manipulation.
  • The "most_active_day" should be the date with the highest activity count within the 30-day window. If multiple days have the same highest activity count, return the most recent date.
  • Assume today's date is always in the future relative to any timestamp in the input data.
  • Focus on clarity and readability in your pseudocode.
  • Pseudocode should clearly outline the steps involved in calculating each metric.
  • Error handling is not explicitly required, but consider how your pseudocode would handle unexpected input formats.

Pseudocode:

FUNCTION analyze_user_activity(activity_logs):
  // Get today's date
  today = CURRENT_DATE

  // Initialize data structures
  active_users = SET()
  daily_activity = DICTIONARY()
  most_active_day = today
  max_activity = 0

  // Iterate through the activity logs
  FOR EACH log IN activity_logs:
    // Convert timestamp to date
    log_date = DATE(log["timestamp"])

    // Check if the log is within the last 30 days
    IF log_date >= today - 30 DAYS:
      // Add user to the set of active users
      active_users.add(log["user_id"])

      // Increment daily activity count
      IF log_date IN daily_activity:
        daily_activity[log_date] = daily_activity[log_date] + 1
      ELSE:
        daily_activity[log_date] = 1

      // Update most active day if necessary
      IF daily_activity[log_date] > max_activity:
        max_activity = daily_activity[log_date]
        most_active_day = log_date
      ELSE IF daily_activity[log_date] == max_activity AND log_date > most_active_day:
        most_active_day = log_date

  // Calculate average daily activity
  num_days = 0
  total_activity = 0
  FOR date IN daily_activity:
    num_days = num_days + 1
    total_activity = total_activity + daily_activity[date]

  IF num_days > 0:
    average_daily_activity = total_activity / num_days
  ELSE:
    average_daily_activity = 0.0

  // Prepare the result
  result = {
    "total_active_users": SIZE(active_users),
    "average_daily_activity": average_daily_activity,
    "most_active_day": most_active_day
  }

  RETURN result
Loading editor...
plaintext