Implementing Rabin-Karp String Searching Algorithm in JavaScript
The Rabin-Karp algorithm is an efficient string searching algorithm that uses hashing to quickly find occurrences of a pattern within a text. It's particularly useful for scenarios where you need to search for multiple patterns within a large text, as the hashing mechanism can be reused. This challenge will guide you through implementing this algorithm in JavaScript.
Problem Description
Your task is to implement the Rabin-Karp string searching algorithm in JavaScript. The function should take two string arguments: text (the larger string to search within) and pattern (the smaller string to search for). The function should return an array of all starting indices where the pattern is found within the text.
Key Requirements:
- Hashing: Implement a rolling hash function to efficiently calculate hash values for substrings of
textand for thepattern. - Comparison: When a hash match occurs, perform a character-by-character comparison to confirm a true match and avoid false positives.
- Return Indices: Return an array containing the starting indices of all exact occurrences of the
patternin thetext.
Expected Behavior:
- If the
patternis not found, return an empty array. - If the
patternis empty, the behavior is undefined for this challenge (or you can choose to return an empty array or[0]if the text is also empty). For simplicity, assume non-empty patterns unless specified in constraints. - If the
patternis longer than thetext, it cannot be found, so return an empty array.
Edge Cases to Consider:
- Empty
textorpattern. patternlonger thantext.patternandtextare identical.- Multiple overlapping occurrences of the
pattern. - Characters that might cause hash collisions (though the algorithm should handle this with the character-by-character check).
Examples
Example 1:
Input:
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
Output:
[10]
Explanation:
The pattern "ABABCABAB" is found starting at index 10 in the text.
Example 2:
Input:
text = "AAAAA"
pattern = "AAA"
Output:
[0, 1, 2]
Explanation:
The pattern "AAA" is found starting at index 0, index 1, and index 2 in the text.
Example 3:
Input:
text = "THIS IS A TEST TEXT"
pattern = "TEST"
Output:
[10]
Explanation:
The pattern "TEST" is found starting at index 10 in the text.
Example 4:
Input:
text = "abcdef"
pattern = "xyz"
Output:
[]
Explanation:
The pattern "xyz" does not exist in the text.
Constraints
- The length of
textwill be between 0 and 10<sup>5</sup> characters. - The length of
patternwill be between 1 and 10<sup>5</sup> characters. - Both
textandpatternwill consist of lowercase and uppercase English letters, and digits. - The modulo operation for hashing should be performed using a large prime number (e.g., 101) to minimize collisions.
- The base for the hash calculation should also be a prime number (e.g., 256, representing the number of possible ASCII characters, or a smaller prime like 31 for simplicity if only alphanumeric characters are guaranteed).
Notes
- A common approach for Rabin-Karp is to use a polynomial rolling hash function. For a substring $S = s_1s_2...s_k$, the hash can be calculated as: $hash(S) = (s_1 \cdot base^{k-1} + s_2 \cdot base^{k-2} + ... + s_k \cdot base^0) \pmod{prime}$
- To efficiently calculate the hash of the next window (rolling hash), you can use the previous window's hash: $new_hash = ((old_hash - s_{old} \cdot base^{k-1}) \cdot base + s_{new}) \pmod{prime}$ Remember to handle negative results from the modulo operation correctly by adding the prime number if it's negative.
- The value of $base^{k-1} \pmod{prime}$ will be needed to remove the contribution of the leading character. This can be pre-calculated.
- The character-by-character comparison is crucial to ensure correctness due to potential hash collisions.