Hone logo
Hone
Problems

Find Users With Valid E-Mails

This challenge focuses on validating email addresses based on a specific set of rules. You'll be given a list of user data, and your task is to identify and extract users whose email addresses conform to the defined criteria, demonstrating your ability to parse data and apply validation logic.

Problem Description

You are tasked with processing a dataset of users, each represented by a unique user ID and an email address. Your goal is to filter this dataset and return a list of all users whose email addresses are considered "valid" according to a predefined set of rules. This is a common task in data cleaning and user management.

Key Requirements:

  1. Email Structure: A valid email address must contain exactly one "@" symbol.
  2. Domain Name: The part of the email address after the "@" symbol (the domain) must contain at least one "." symbol.
  3. Username Length: The part of the email address before the "@" symbol (the username) must have a minimum length of 3 characters.
  4. Domain Length: The part of the email address after the "@" symbol (the domain) must have a minimum length of 3 characters.
  5. Valid Characters:
    • The username can only contain alphanumeric characters (a-z, A-Z, 0-9) and the underscore character ('_').
    • The domain can only contain alphanumeric characters (a-z, A-Z, 0-9) and the hyphen character ('-').
    • The part of the domain after the last "." (the top-level domain or TLD) must consist only of alphabetic characters (a-z, A-Z) and have a minimum length of 2 characters.

Expected Behavior:

Your function should accept a list of user objects/dictionaries, where each object has a user_id and an email field. It should return a new list containing only the user objects whose email satisfies all the specified validation rules. The order of users in the output list should be the same as their order in the input list.

Edge Cases to Consider:

  • Empty input list.
  • Emails with no "@" symbol.
  • Emails with multiple "@" symbols.
  • Emails with no "." in the domain.
  • Usernames shorter than 3 characters.
  • Domains shorter than 3 characters.
  • Usernames containing invalid characters (e.g., "!", "#", "$").
  • Domains containing invalid characters (e.g., "_", "!").
  • TLDs with invalid characters or too short.
  • Emails where the username or domain is empty.

Examples

Example 1:

Input: [
  { "user_id": 1, "email": "john.doe@example.com" },
  { "user_id": 2, "email": "jane_smith123@sub-domain.co.uk" },
  { "user_id": 3, "email": "invalid-email" },
  { "user_id": 4, "email": "short@d.com" }
]
Output: [
  { "user_id": 1, "email": "john.doe@example.com" },
  { "user_id": 2, "email": "jane_smith123@sub-domain.co.uk" }
]
Explanation:
- User 1: "john.doe" is valid, "@" exists, "example.com" is valid.
- User 2: "jane_smith123" is valid, "@" exists, "sub-domain.co.uk" is valid.
- User 3: Missing "@" symbol. Invalid.
- User 4: Domain "d.com" is too short (less than 3 characters). Invalid.

Example 2:

Input: [
  { "user_id": 5, "email": "test!user@domain.org" },
  { "user_id": 6, "email": "another@domain-.net" },
  { "user_id": 7, "email": "valid@domain.t" }
]
Output: []
Explanation:
- User 5: Username "test!user" contains an invalid character "!". Invalid.
- User 6: Domain "domain-.net" contains an invalid character "-" not preceded by a letter/number in the domain part. Also, domain labels should not start or end with hyphens according to common RFCs, though our rules are simpler. Specifically, our rule says domain can only contain alphanumeric and hyphen. "domain-" is valid. The last part ".net" is valid. Let's re-evaluate based on stricter rules applied in explanation below.
  Let's refine the explanation to match the rules precisely: Domain "domain-.net" is valid according to rules 1-5. The username "another" is valid. The domain part "domain-.net" has a "." and is longer than 3 chars. "domain-" is valid. ".net" is a valid TLD. This would be valid under the current rules. Let's adjust Example 2 for clarity.

**Corrected Example 2:**

Input: [ { "user_id": 5, "email": "test!user@domain.org" }, { "user_id": 6, "email": "another@domain-.net" }, { "user_id": 7, "email": "valid@domain.t" }, { "user_id": 8, "email": "user@domain.c" } ] Output: [] Explanation:

  • User 5: Username "test!user" contains an invalid character "!". Invalid.
  • User 6: Domain "domain-.net" is valid by the rules. Username "another" is valid. The TLD ".net" is valid. This would be valid. Let's adjust Example 2 again to ensure it results in an empty output.

Revised Example 2:

Input: [
  { "user_id": 5, "email": "test!user@domain.org" },
  { "user_id": 6, "email": "another@domain-.net" },
  { "user_id": 7, "email": "valid@domain.t" },
  { "user_id": 8, "email": "user@domain.c" },
  { "user_id": 9, "email": "user@doma_in.com" }
]
Output: []
Explanation:
- User 5: Username "test!user" contains an invalid character "!". Invalid.
- User 6: Domain "domain-.net" is valid by the rules. Username "another" is valid. TLD ".net" is valid. This email would actually be valid based on the rules provided. Forcing an empty output requires invalid emails. Let's adjust the input again.

**Final Example 2:**

Input: [ { "user_id": 5, "email": "test!user@domain.org" }, { "user_id": 6, "email": "user@domain-.c" }, { "user_id": 7, "email": "valid@domain." }, { "user_id": 8, "email": "short@d.com" } ] Output: [] Explanation:

  • User 5: Username "test!user" contains an invalid character "!". Invalid.
  • User 6: TLD "c" is too short (minimum length 2). Invalid.
  • User 7: Domain ends with ".". Invalid. The TLD cannot be empty.
  • User 8: Domain "d.com" is too short (less than 3 characters). Invalid.

**Example 3:** (Edge Case)

Input: [ { "user_id": 10, "email": "a@b.c" }, { "user_id": 11, "email": "aa@bb.cc" }, { "user_id": 12, "email": "aaa@bbb.com" }, { "user_id": 13, "email": "abc@def.gh" }, { "user_id": 14, "email": "abc@d-ef.gh" }, { "user_id": 15, "email": "abc@d.ef.gh" }, { "user_id": 16, "email": "abc@.com" } ] Output: [ { "user_id": 11, "email": "aa@bb.cc" }, { "user_id": 12, "email": "aaa@bbb.com" }, { "user_id": 13, "email": "abc@def.gh" }, { "user_id": 14, "email": "abc@d-ef.gh" } ] Explanation:

  • User 10: Username "a" is too short (minimum length 3). Invalid.
  • User 11: Valid. Username "aa" - wait, this should be invalid. Username length is 2. Let's re-examine rules. Username length minimum 3. Okay, so "aa" is invalid. Let's adjust example 3.

Revised Example 3:

Input: [
  { "user_id": 10, "email": "a@b.c" },
  { "user_id": 11, "email": "aa@bb.cc" },
  { "user_id": 12, "email": "aaa@bbb.com" },
  { "user_id": 13, "email": "abc@def.gh" },
  { "user_id": 14, "email": "abc@d-ef.gh" },
  { "user_id": 15, "email": "abc@d.ef.gh" },
  { "user_id": 16, "email": "abc@.com" },
  { "user_id": 17, "email": "user@doma.in" }
]
Output: [
  { "user_id": 12, "email": "aaa@bbb.com" },
  { "user_id": 13, "email": "abc@def.gh" },
  { "user_id": 14, "email": "abc@d-ef.gh" },
  { "user_id": 15, "email": "abc@d.ef.gh" },
  { "user_id": 17, "email": "user@doma.in" }
]
Explanation:
- User 10: Username "a" too short. Invalid.
- User 11: Username "aa" too short. Invalid.
- User 12: Valid. Username "aaa", "@", domain "bbb.com".
- User 13: Valid. Username "abc", "@", domain "def.gh". TLD "gh" is valid.
- User 14: Valid. Username "abc", "@", domain "d-ef.gh". Hyphen in domain is allowed. TLD "gh" is valid.
- User 15: Valid. Username "abc", "@", domain "d.ef.gh". Multiple dots in domain are fine, as long as TLD rules are met. TLD "gh" is valid.
- User 16: Domain starts with ".". Invalid.
- User 17: Valid. Username "user", "@", domain "doma.in". TLD "in" is valid.

Constraints

  • The input will be a list of user objects/dictionaries.
  • Each user object will have at least two properties: user_id (an integer) and email (a string).
  • The email string will not be null or empty.
  • The total number of users in the input list will be between 0 and 1000.
  • The length of each email string will be between 5 and 254 characters.
  • Your solution should aim for an efficient runtime, ideally linear with respect to the total number of characters in all email addresses.

Notes

  • Remember to handle the order of operations carefully when parsing and validating.
  • Consider breaking down the validation logic into smaller, reusable helper functions for better readability and maintainability.
  • The validation rules are specific to this challenge. Real-world email validation can be significantly more complex.
  • When checking for valid characters, ensure you cover both uppercase and lowercase letters.
  • For splitting the email address, you'll primarily need to work with the "@" symbol and the "." symbol.
Loading editor...
plaintext