The Latest Login in 2020
Imagine a system that tracks user logins. We're interested in understanding user activity, specifically when users last accessed the system within a particular year. This challenge asks you to identify the most recent login for each unique user during the year 2020. This information can be valuable for activity reports, identifying dormant accounts, or understanding peak usage times.
Problem Description
You are given a dataset of user login events. Each event records a user ID and the timestamp of their login. Your task is to find the most recent login timestamp for each unique user that occurred within the calendar year 2020.
Requirements:
- Process a list of login events.
- For each distinct user, determine their latest login timestamp only if that login occurred in 2020.
- If a user had no logins in 2020, they should not appear in the output.
- The output should be a list of user IDs and their corresponding latest 2020 login timestamps.
Expected Behavior: The output should be a collection (e.g., a list of pairs, a dictionary, a map) where each entry represents a unique user and their latest login timestamp in 2020.
Edge Cases:
- Users with no logins in 2020.
- Users with multiple logins in 2020.
- Users with logins both inside and outside of 2020.
- An empty input list of logins.
Examples
Example 1:
Input: [
{"user_id": 101, "timestamp": "2020-01-15T10:00:00Z"},
{"user_id": 102, "timestamp": "2020-03-20T14:30:00Z"},
{"user_id": 101, "timestamp": "2020-02-10T09:15:00Z"},
{"user_id": 103, "timestamp": "2021-01-05T11:00:00Z"}
]
Output: [
{"user_id": 101, "latest_login_2020": "2020-02-10T09:15:00Z"},
{"user_id": 102, "latest_login_2020": "2020-03-20T14:30:00Z"}
]
Explanation: User 101 logged in twice in 2020. The latest is Feb 10th. User 102 logged in once in 2020. User 103's login was in 2021, so it's ignored.
Example 2:
Input: [
{"user_id": 201, "timestamp": "2019-12-31T23:59:59Z"},
{"user_id": 202, "timestamp": "2020-12-31T23:59:59Z"},
{"user_id": 201, "timestamp": "2020-07-01T12:00:00Z"},
{"user_id": 202, "timestamp": "2020-01-01T00:00:01Z"}
]
Output: [
{"user_id": 201, "latest_login_2020": "2020-07-01T12:00:00Z"},
{"user_id": 202, "latest_login_2020": "2020-12-31T23:59:59Z"}
]
Explanation: User 201's latest 2020 login is July 1st. User 202's latest 2020 login is Dec 31st. Their 2019 login is ignored.
Example 3: (Edge Case: No 2020 logins for some users)
Input: [
{"user_id": 301, "timestamp": "2020-05-01T08:00:00Z"},
{"user_id": 302, "timestamp": "2019-01-01T00:00:00Z"},
{"user_id": 303, "timestamp": "2021-01-01T00:00:00Z"},
{"user_id": 301, "timestamp": "2020-04-01T08:00:00Z"}
]
Output: [
{"user_id": 301, "latest_login_2020": "2020-05-01T08:00:00Z"}
]
Explanation: Only user 301 had logins in 2020. Users 302 and 303 had logins outside the target year.
Constraints
- The number of login events can range from 0 to 100,000.
- User IDs are integers.
- Timestamps are strings in ISO 8601 format (e.g., "YYYY-MM-DDTHH:MM:SSZ").
- The processing should be efficient enough to handle up to 100,000 events within a reasonable time (e.g., under 5 seconds).
Notes
- You will need to parse the timestamp strings to compare dates and determine if a login falls within 2020.
- Consider how to efficiently store and update the latest login for each user as you iterate through the login events.
- The output order of users does not matter.