Implement Boyer-Moore String Searching Algorithm in JavaScript
The Boyer-Moore algorithm is a highly efficient string searching algorithm known for its performance, often outperforming simpler algorithms like naive string matching, especially with larger alphabets and longer patterns. Your task is to implement this algorithm in JavaScript to find all occurrences of a given pattern within a larger text.
Problem Description
You are tasked with implementing the Boyer-Moore string searching algorithm in JavaScript. This algorithm should find all starting indices where a given pattern string occurs within a text string.
Key Requirements:
- Implement the Boyer-Moore Algorithm: This involves implementing both the "Bad Character Rule" and the "Good Suffix Rule" for efficient skipping of characters.
- Return All Occurrences: The function should return an array of all starting indices where the
patternis found in thetext. - Case Sensitivity: The search should be case-sensitive.
Expected Behavior:
The function should take two string arguments: text (the string to search within) and pattern (the string to search for). It should return an array of numbers, where each number represents a starting index of a match. If no matches are found, an empty array should be returned.
Edge Cases to Consider:
- Empty
textorpattern. patternlonger thantext.patternandtextare identical.- Multiple overlapping or non-overlapping occurrences of the
pattern.
Examples
Example 1:
Input: text = "THIS IS A TEST TEXT", pattern = "TEST"
Output: [10]
Explanation: The pattern "TEST" is found starting at index 10 in the text.
Example 2:
Input: text = "ABAAABCD", pattern = "ABC"
Output: [4]
Explanation: The pattern "ABC" is found starting at index 4.
Example 3:
Input: text = "ABABABA", pattern = "ABA"
Output: [0, 2, 4]
Explanation: The pattern "ABA" is found at indices 0, 2, and 4.
Example 4: (Edge Case)
Input: text = "AAAAA", pattern = "AA"
Output: [0, 1, 2, 3]
Explanation: The pattern "AA" is found at multiple overlapping positions.
Example 5: (Edge Case)
Input: text = "HELLO", pattern = "WORLD"
Output: []
Explanation: The pattern "WORLD" is not found in "HELLO".
Example 6: (Edge Case)
Input: text = "SHORT", pattern = "LONGERPATTERN"
Output: []
Explanation: The pattern is longer than the text.
Constraints
- The
textandpatternstrings will consist of ASCII characters. - The length of
textwill be between 0 and 1,000,000 characters, inclusive. - The length of
patternwill be between 0 and 1000 characters, inclusive. - The algorithm should aim for an average time complexity better than O(nm), where n is the length of the text and m is the length of the pattern.
Notes
The Boyer-Moore algorithm's efficiency comes from its ability to "jump" ahead in the text. It achieves this by pre-processing the pattern to create lookup tables that guide these jumps. You will likely need to implement helper functions to build these tables for both the Bad Character Rule and the Good Suffix Rule. Remember to handle the case where the pattern is empty or the text is empty gracefully.