Hone logo
Hone
Problems

Knuth-Morris-Pratt (KMP) String Searching Algorithm

The Knuth-Morris-Pratt (KMP) algorithm is a highly efficient string searching algorithm that avoids unnecessary comparisons by leveraging information about the pattern itself. This challenge asks you to implement the KMP algorithm in JavaScript to find all occurrences of a pattern string within a larger text string. Understanding and implementing KMP is crucial for optimizing string search operations, particularly when dealing with large datasets or repetitive patterns.

Problem Description

You are tasked with implementing the KMP algorithm to locate all occurrences of a given pattern string within a text string. The algorithm should return an array containing the starting indices of each occurrence of the pattern in the text. The core of the KMP algorithm lies in pre-computing a "longest proper prefix suffix" (LPS) array for the pattern. This LPS array helps to avoid redundant comparisons when a mismatch occurs during the search.

What needs to be achieved:

  • Implement the KMP algorithm to find all occurrences of a pattern within a text.
  • Calculate the LPS array for the pattern.
  • Return an array of starting indices where the pattern is found in the text.

Key Requirements:

  • The function should accept two strings as input: text (the string to search within) and pattern (the string to search for).
  • The function should return an array of integers representing the starting indices of all occurrences of the pattern in the text.
  • If the pattern is not found in the text, the function should return an empty array.
  • The implementation should be efficient, leveraging the LPS array to minimize comparisons.

Expected Behavior:

The algorithm should efficiently scan the text string, comparing it to the pattern. When a mismatch occurs, the LPS array should be used to determine how far to shift the pattern to the right, avoiding unnecessary comparisons.

Edge Cases to Consider:

  • Empty pattern string: Should return an array containing 0 (as an empty string is technically found at the beginning of any string).
  • Empty text string: Should return an empty array.
  • Pattern longer than text: Should return an empty array.
  • Overlapping occurrences of the pattern.
  • Pattern containing special characters.

Examples

Example 1:

Input: text = "ABABDABACDABABCABAB", pattern = "ABABCABAB"
Output: [10]
Explanation: The pattern "ABABCABAB" is found starting at index 10 in the text.

Example 2:

Input: text = "AAAAABAAABA", pattern = "AAAA"
Output: [0, 1]
Explanation: The pattern "AAAA" is found starting at indices 0 and 1 in the text.

Example 3:

Input: text = "ABC ABCDAB ABCDABCDABDE", pattern = "ABCDABD"
Output: [15]
Explanation: The pattern "ABCDABD" is found starting at index 15 in the text.

Example 4:

Input: text = "abc", pattern = ""
Output: [0]
Explanation: An empty pattern is considered to be found at the beginning of any string.

Constraints

  • text.length can be between 0 and 10<sup>6</sup>.
  • pattern.length can be between 0 and 10<sup>5</sup>.
  • Both text and pattern will contain only ASCII characters.
  • The algorithm should have a time complexity of O(m + n), where n is the length of the text and m is the length of the pattern.

Notes

  • The LPS array stores the length of the longest proper prefix of the pattern that is also a suffix of the pattern's prefixes.
  • Consider building the LPS array iteratively.
  • The KMP algorithm's efficiency comes from intelligently shifting the pattern based on the LPS array when a mismatch occurs. Don't simply shift the pattern by one character.
  • A "proper" prefix/suffix excludes the entire string itself. For example, for "ABAB", "AB" is a proper prefix, and "AB" is a proper suffix.
Loading editor...
javascript