Hone logo
Hone
Problems

Analyzing User Engagement Trends with Cumulative Metrics

This challenge focuses on using advanced SQL window functions to analyze user engagement over time. You'll be calculating cumulative metrics that provide insights into how user activity grows and changes, which is crucial for understanding product adoption, retention, and overall business health.

Problem Description

You are tasked with analyzing a dataset of user activity logs. Each log entry represents a specific action performed by a user on a given date. Your goal is to calculate two key cumulative metrics for each user:

  1. Cumulative Actions: The total number of actions a user has performed from the beginning of their activity up to and including the current date.
  2. Lagged Previous Day Actions: The number of actions a user performed on the previous day. If a user performed no actions on the previous day, this value should be 0.

Key Requirements:

  • Process user activity by date.
  • Handle users with no activity on consecutive days.
  • Ensure calculations are performed independently for each user.

Expected Behavior:

  • The output should contain each user's ID, the date of their activity, the number of actions on that specific date, the cumulative actions up to that date, and the number of actions from the immediately preceding day.
  • For the very first recorded action of a user, the "Lagged Previous Day Actions" should be 0.

Edge Cases:

  • Users who have only one recorded activity.
  • Gaps in activity for a user (i.e., a user has activity on day 1 and day 3, but not day 2).

Examples

Example 1:

Input:

UserActions Table:

UserIDActionDateNumberOfActions
1012023-10-262
1012023-10-273
1022023-10-261
1012023-10-281
1022023-10-282
1032023-10-295

Output:

UserIDActionDateNumberOfActionsCumulativeActionsLaggedPreviousDayActions
1012023-10-26220
1012023-10-27352
1012023-10-28163
1022023-10-26110
1022023-10-28230
1032023-10-29550

Explanation:

  • For User 101 on 2023-10-26, CumulativeActions is 2 (2). LaggedPreviousDayActions is 0 as it's their first recorded action.
  • For User 101 on 2023-10-27, CumulativeActions is 5 (2 + 3). LaggedPreviousDayActions is 2 (actions from 2023-10-26).
  • For User 102 on 2023-10-28, CumulativeActions is 3 (1 + 2). LaggedPreviousDayActions is 0 because there's no activity recorded for 2023-10-27.

Example 2:

Input:

UserActions Table:

UserIDActionDateNumberOfActions
2012023-11-0110

Output:

UserIDActionDateNumberOfActionsCumulativeActionsLaggedPreviousDayActions
2012023-11-0110100

Explanation:

  • A single user with a single action day. Cumulative actions are just the actions on that day. Lagged previous day actions are 0.

Constraints

  • UserID is an integer.
  • ActionDate is a date type.
  • NumberOfActions is a non-negative integer.
  • The UserActions table can contain up to 1,000,000 rows.
  • The query should execute efficiently, aiming for a time complexity that scales reasonably with the number of rows.

Notes

  • You will need to consider how to handle the ordering of rows for each user.
  • Think about how to address the "previous day" when there might not be a direct preceding row in the input data for a given user. You might need to consider a way to represent "zero actions" for missing days.
  • Window functions like SUM() OVER() and LAG() will be essential. You might also need to consider the ROWS BETWEEN clause for fine-grained control over the window frame.
Loading editor...
plaintext