Implement the Knuth-Morris-Pratt (KMP) String Searching Algorithm
This challenge requires you to implement the Knuth-Morris-Pratt (KMP) string searching algorithm in JavaScript. The KMP algorithm is an efficient string-searching algorithm that searches for occurrences of a "word" (pattern) within a main "text" string by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next comparison should begin, thus avoiding redundant comparisons. This makes it significantly faster than naive string searching algorithms, especially for repetitive patterns.
Problem Description
Your task is to write a JavaScript function, kmpSearch(text, pattern), that finds all occurrences of a given pattern string within a text string. The function should return an array of the starting indices in the text where the pattern is found.
Key Requirements:
- Implement KMP Logic: You must implement the core logic of the KMP algorithm, which involves two main parts:
- Preprocessing (LPS Array): Create a "longest proper prefix which is also a suffix" (LPS) array for the
pattern. This array helps in efficiently skipping characters during the search phase. - Searching: Use the LPS array to efficiently find all occurrences of the
patternin thetext.
- Preprocessing (LPS Array): Create a "longest proper prefix which is also a suffix" (LPS) array for the
- Return Indices: The function should return an array of all starting indices where the
patternmatches within thetext. - Handle No Matches: If the
patternis not found in thetext, the function should return an empty array.
Expected Behavior:
- The function should correctly identify all non-overlapping and overlapping occurrences of the
pattern. - The algorithm should be efficient, aiming for O(n + m) time complexity, where 'n' is the length of the
textand 'm' is the length of thepattern.
Edge Cases to Consider:
- Empty
textstring. - Empty
patternstring. patternlonger thantext.patternis identical totext.patterncontains repeating characters or sequences.
Examples
Example 1:
Input:
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
Output:
[10]
Explanation:
The pattern "ABABCABAB" is found starting at index 10 in the text.
Example 2:
Input:
text = "AAAAA"
pattern = "AAA"
Output:
[0, 1, 2]
Explanation:
The pattern "AAA" is found starting at indices 0, 1, and 2.
Example 3:
Input:
text = "THIS IS A TEST TEXT"
pattern = "TEST"
Output:
[10]
Explanation:
The pattern "TEST" is found starting at index 10.
Example 4: (Edge Case)
Input:
text = "ABCDEFG"
pattern = "XYZ"
Output:
[]
Explanation:
The pattern "XYZ" is not found in the text.
Example 5: (Edge Case)
Input:
text = "AAAAA"
pattern = ""
Output:
[0, 1, 2, 3, 4]
Explanation:
An empty pattern is considered to match at every position in the text, including after the last character. (Conventionally, though some might return just [0] or an empty array. For this challenge, we will assume it matches at every starting index up to text.length).
Example 6: (Edge Case)
Input:
text = ""
pattern = "A"
Output:
[]
Explanation:
The pattern cannot be found in an empty text.
Constraints
textandpatternwill consist of printable ASCII characters.- The length of
text(n) will be between 0 and 1,000,000. - The length of
pattern(m) will be between 0 and 1,000,000. - Your implementation should aim for a time complexity of O(n + m) and a space complexity of O(m) for the LPS array.
Notes
- The core of the KMP algorithm lies in the efficient computation and utilization of the LPS array.
- The LPS array
lps[i]for a patternPstores the length of the longest proper prefix ofP[0...i]that is also a suffix ofP[0...i]. - A "proper prefix" of a string is a prefix that is not equal to the string itself.
- Consider how to handle the case where the
patternis an empty string. A common convention is that an empty pattern matches at every possible starting position. - You will likely need two helper functions: one to compute the LPS array and another for the main search logic.