Rust Sanitizer: Building Secure Input Processing
In modern software development, handling user-provided input securely is paramount. Malicious input can lead to a wide range of vulnerabilities, from cross-site scripting (XSS) to SQL injection. This challenge focuses on building robust sanitizers in Rust to transform potentially unsafe input into safe, usable data.
Problem Description
Your task is to implement a Sanitizer struct in Rust that can process a given string input and return a sanitized version of it. The sanitization process should involve removing or escaping specific characters that are commonly used in attacks.
Key Requirements:
-
HTML Sanitization: Remove or escape characters that could be interpreted as HTML tags or attributes. Specifically, you should:
- Escape
<to< - Escape
>to> - Escape
&to& - Remove any characters that are not alphanumeric, whitespace, or one of the allowed punctuation marks:
. , ! ? - _.
- Escape
-
URL Sanitization: For input intended to be part of a URL, ensure that potentially problematic characters are escaped to prevent injection attacks. Specifically, you should:
- Escape spaces to
%20. - Escape
&to%26. - Escape
=to%3D. - Escape
?to%3F.
- Escape spaces to
-
Combined Sanitization: Implement a method that applies both HTML and URL sanitization sequentially. The order of operations matters: HTML sanitization should be applied first, followed by URL sanitization.
Expected Behavior:
The Sanitizer struct should have methods to perform these sanitization tasks.
Edge Cases:
- Empty input strings.
- Strings containing only characters to be removed or escaped.
- Strings with mixed characters requiring different sanitization rules.
Examples
Example 1: HTML Sanitization
Input String: "<script>alert('XSS')</script> This is & important."
Sanitizer Type: HTML
Output String: "alert('XSS') This is & important."
Explanation: The < and > characters are escaped to < and > respectively. The & is escaped to &. The characters outside of alphanumeric, whitespace, and allowed punctuation (like the angled brackets in the script tags) are removed.
Example 2: URL Sanitization
Input String: "search?query=my document & sort=asc"
Sanitizer Type: URL
Output String: "search%3Fquery%3Dmy%20document%20%26%20sort%3Dasc"
Explanation: Spaces are converted to %20, ? to %3F, & to %26, and = to %3D.
Example 3: Combined Sanitization
Input String: "<a href='http://example.com?id=1&name=test'>Link</a>"
Sanitizer Type: Combined (HTML then URL)
Output String: "<a href='http%3A%2F%2Fexample.com%3Fid%3D1%26name%3Dtest'>Link</a>"
Explanation:
- HTML Sanitization:
<becomes<>becomes>&becomes&The intermediate string after HTML sanitization (focusing on the parts relevant to the next step):<a href='http://example.com?id=1&name=test'>Link</a>
- URL Sanitization (applied to the original URL parts from the intermediate string):
- The original URL string
http://example.com?id=1&name=testwithin the HTML context needs to be considered for URL sanitization. The components that were not already escaped by HTML sanitization will now be URL-sanitized. :becomes%3A/becomes%2F?becomes%3F=becomes%3D&becomes%26The final output string reflects the combination of these rules applied sequentially.
- The original URL string
Constraints
- Input strings will consist of ASCII characters.
- The length of input strings will not exceed 1024 characters.
- The sanitization operations should be performed in-place or by creating a new string, with a preference for efficiency.
- The
Sanitizershould not introduce any new vulnerabilities.
Notes
- Consider using Rust's string manipulation capabilities effectively.
- Think about how to handle the different character sets for HTML and URL sanitization.
- The "allowed punctuation" for HTML sanitization is a specific set:
. , ! ? - _. Any other punctuation not in this list should be removed. - For URL sanitization, remember to escape all characters that have special meaning in URLs, even if they are not explicitly listed in the problem description if they appear in a context that requires URL encoding. The examples provide the core set.