Hone logo
Hone
Problems

Implement the Knuth-Morris-Pratt (KMP) String Searching Algorithm

This challenge requires you to implement the Knuth-Morris-Pratt (KMP) string searching algorithm in JavaScript. The KMP algorithm is an efficient string-searching algorithm that searches for occurrences of a "word" (pattern) within a main "text" string by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next comparison should begin, thus avoiding redundant comparisons. This makes it significantly faster than naive string searching algorithms, especially for repetitive patterns.

Problem Description

Your task is to write a JavaScript function, kmpSearch(text, pattern), that finds all occurrences of a given pattern string within a text string. The function should return an array of the starting indices in the text where the pattern is found.

Key Requirements:

  1. Implement KMP Logic: You must implement the core logic of the KMP algorithm, which involves two main parts:
    • Preprocessing (LPS Array): Create a "longest proper prefix which is also a suffix" (LPS) array for the pattern. This array helps in efficiently skipping characters during the search phase.
    • Searching: Use the LPS array to efficiently find all occurrences of the pattern in the text.
  2. Return Indices: The function should return an array of all starting indices where the pattern matches within the text.
  3. Handle No Matches: If the pattern is not found in the text, the function should return an empty array.

Expected Behavior:

  • The function should correctly identify all non-overlapping and overlapping occurrences of the pattern.
  • The algorithm should be efficient, aiming for O(n + m) time complexity, where 'n' is the length of the text and 'm' is the length of the pattern.

Edge Cases to Consider:

  • Empty text string.
  • Empty pattern string.
  • pattern longer than text.
  • pattern is identical to text.
  • pattern contains repeating characters or sequences.

Examples

Example 1:

Input:
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"

Output:
[10]

Explanation:
The pattern "ABABCABAB" is found starting at index 10 in the text.

Example 2:

Input:
text = "AAAAA"
pattern = "AAA"

Output:
[0, 1, 2]

Explanation:
The pattern "AAA" is found starting at indices 0, 1, and 2.

Example 3:

Input:
text = "THIS IS A TEST TEXT"
pattern = "TEST"

Output:
[10]

Explanation:
The pattern "TEST" is found starting at index 10.

Example 4: (Edge Case)

Input:
text = "ABCDEFG"
pattern = "XYZ"

Output:
[]

Explanation:
The pattern "XYZ" is not found in the text.

Example 5: (Edge Case)

Input:
text = "AAAAA"
pattern = ""

Output:
[0, 1, 2, 3, 4]

Explanation:
An empty pattern is considered to match at every position in the text, including after the last character. (Conventionally, though some might return just [0] or an empty array. For this challenge, we will assume it matches at every starting index up to text.length).

Example 6: (Edge Case)

Input:
text = ""
pattern = "A"

Output:
[]

Explanation:
The pattern cannot be found in an empty text.

Constraints

  • text and pattern will consist of printable ASCII characters.
  • The length of text (n) will be between 0 and 1,000,000.
  • The length of pattern (m) will be between 0 and 1,000,000.
  • Your implementation should aim for a time complexity of O(n + m) and a space complexity of O(m) for the LPS array.

Notes

  • The core of the KMP algorithm lies in the efficient computation and utilization of the LPS array.
  • The LPS array lps[i] for a pattern P stores the length of the longest proper prefix of P[0...i] that is also a suffix of P[0...i].
  • A "proper prefix" of a string is a prefix that is not equal to the string itself.
  • Consider how to handle the case where the pattern is an empty string. A common convention is that an empty pattern matches at every possible starting position.
  • You will likely need two helper functions: one to compute the LPS array and another for the main search logic.
Loading editor...
javascript