Hone logo
Hone
Problems

Implementing Rabin-Karp String Searching Algorithm in JavaScript

The Rabin-Karp algorithm is an efficient string searching algorithm that uses hashing to quickly find occurrences of a pattern within a text. It's particularly useful for scenarios where you need to search for multiple patterns within a large text, as the hashing mechanism can be reused. This challenge will guide you through implementing this algorithm in JavaScript.

Problem Description

Your task is to implement the Rabin-Karp string searching algorithm in JavaScript. The function should take two string arguments: text (the larger string to search within) and pattern (the smaller string to search for). The function should return an array of all starting indices where the pattern is found within the text.

Key Requirements:

  1. Hashing: Implement a rolling hash function to efficiently calculate hash values for substrings of text and for the pattern.
  2. Comparison: When a hash match occurs, perform a character-by-character comparison to confirm a true match and avoid false positives.
  3. Return Indices: Return an array containing the starting indices of all exact occurrences of the pattern in the text.

Expected Behavior:

  • If the pattern is not found, return an empty array.
  • If the pattern is empty, the behavior is undefined for this challenge (or you can choose to return an empty array or [0] if the text is also empty). For simplicity, assume non-empty patterns unless specified in constraints.
  • If the pattern is longer than the text, it cannot be found, so return an empty array.

Edge Cases to Consider:

  • Empty text or pattern.
  • pattern longer than text.
  • pattern and text are identical.
  • Multiple overlapping occurrences of the pattern.
  • Characters that might cause hash collisions (though the algorithm should handle this with the character-by-character check).

Examples

Example 1:

Input:
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"

Output:
[10]

Explanation:
The pattern "ABABCABAB" is found starting at index 10 in the text.

Example 2:

Input:
text = "AAAAA"
pattern = "AAA"

Output:
[0, 1, 2]

Explanation:
The pattern "AAA" is found starting at index 0, index 1, and index 2 in the text.

Example 3:

Input:
text = "THIS IS A TEST TEXT"
pattern = "TEST"

Output:
[10]

Explanation:
The pattern "TEST" is found starting at index 10 in the text.

Example 4:

Input:
text = "abcdef"
pattern = "xyz"

Output:
[]

Explanation:
The pattern "xyz" does not exist in the text.

Constraints

  • The length of text will be between 0 and 10<sup>5</sup> characters.
  • The length of pattern will be between 1 and 10<sup>5</sup> characters.
  • Both text and pattern will consist of lowercase and uppercase English letters, and digits.
  • The modulo operation for hashing should be performed using a large prime number (e.g., 101) to minimize collisions.
  • The base for the hash calculation should also be a prime number (e.g., 256, representing the number of possible ASCII characters, or a smaller prime like 31 for simplicity if only alphanumeric characters are guaranteed).

Notes

  • A common approach for Rabin-Karp is to use a polynomial rolling hash function. For a substring $S = s_1s_2...s_k$, the hash can be calculated as: $hash(S) = (s_1 \cdot base^{k-1} + s_2 \cdot base^{k-2} + ... + s_k \cdot base^0) \pmod{prime}$
  • To efficiently calculate the hash of the next window (rolling hash), you can use the previous window's hash: $new_hash = ((old_hash - s_{old} \cdot base^{k-1}) \cdot base + s_{new}) \pmod{prime}$ Remember to handle negative results from the modulo operation correctly by adding the prime number if it's negative.
  • The value of $base^{k-1} \pmod{prime}$ will be needed to remove the contribution of the leading character. This can be pre-calculated.
  • The character-by-character comparison is crucial to ensure correctness due to potential hash collisions.
Loading editor...
javascript