Python Regex Pattern Creation Challenge: Email Address Validator
Regular expressions (regex) are a powerful tool for pattern matching and manipulation of text. This challenge will test your ability to construct precise regex patterns in Python to validate email addresses according to common, but not overly strict, rules. This skill is fundamental for data cleaning, input validation, and web scraping.
Problem Description
Your task is to create a Python function that takes a string as input and returns True if the string represents a valid email address format, and False otherwise. You will need to use Python's re module to define and apply your regex pattern.
Key Requirements:
- The email address must contain exactly one "@" symbol.
- The part before the "@" (the local part) can contain alphanumeric characters, periods (
.), underscores (_), hyphens (-), and plus signs (+). It must not be empty. - The part after the "@" (the domain part) must contain at least one period (
.). - The domain part can contain alphanumeric characters and hyphens (
-). - The top-level domain (TLD), which is the part after the last period in the domain, must be at least two characters long and consist only of alphabetic characters.
- The local part and the domain part cannot start or end with a hyphen or a period.
- Multiple consecutive periods are not allowed in the local or domain part.
Expected Behavior:
- The function should correctly identify valid email formats.
- The function should correctly reject invalid email formats.
Edge Cases to Consider:
- Empty input string.
- Strings with no "@" symbol.
- Strings with multiple "@" symbols.
- Local parts or domain parts starting/ending with invalid characters.
- Domain parts without a period.
- TLDs that are too short or contain invalid characters.
- Consecutive periods in the local or domain parts.
Examples
Example 1:
Input: "test.email+alias@example.com"
Output: True
Explanation: This is a valid email address. The local part is "test.email+alias", the domain is "example.com", and the TLD "com" is valid.
Example 2:
Input: "invalid-email@"
Output: False
Explanation: The domain part is missing.
Example 3:
Input: "user@domain..com"
Output: False
Explanation: Consecutive periods are not allowed in the domain part.
Example 4:
Input: "user@domain.c"
Output: False
Explanation: The TLD "c" is too short.
Example 5:
Input: ".user@domain.com"
Output: False
Explanation: The local part starts with a period.
Example 6:
Input: "user@domain-.com"
Output: False
Explanation: The domain part ends with a hyphen.
Example 7:
Input: "user@domain.com."
Output: False
Explanation: The domain part ends with a period.
Constraints
- The input will be a single string.
- The input string can be empty.
- The length of the input string will not exceed 254 characters (a common practical limit for email addresses).
- Your regex pattern should be efficient enough to handle typical email validation scenarios without significant performance degradation.
Notes
- You will need to import the
remodule in Python. - Consider using
re.match()orre.fullmatch()for this task, as you want to validate the entire string.re.fullmatch()is generally preferred for validating an entire string against a pattern. - Remember that some characters have special meaning in regex and may need to be escaped (e.g.,
.). - Think carefully about how to handle the "at least one period" and "at least two alphabetic characters for TLD" requirements within the domain part.