Python Program for Boyer-Moore Algorithm
The Boyer-Moore algorithm is one of the most efficient string searching algorithms. It skips sections of the text, making it faster than naive approaches.
It uses heuristics like the bad character rule and good suffix rule to optimize searching.
1. Understanding the Problem
Given a text and a pattern, find all occurrences of the pattern in the text efficiently.
Input: text = "HERE IS A SIMPLE EXAMPLE", pattern = "EXAMPLE" Output: Found at index 17
2. Method 1: Bad Character Heuristic
def bad_char_heuristic(pattern):
bad_char = {}
for i in range(len(pattern)):
bad_char[pattern[i]] = i
return bad_char
def boyer_moore(text, pattern):
bad_char = bad_char_heuristic(pattern)
n = len(text)
m = len(pattern)
s = 0
while s <= n - m:
j = m - 1
while j >= 0 and pattern[j] == text[s + j]:
j -= 1
if j < 0:
print("Pattern found at index", s)
s += (m - bad_char.get(text[s + m], -1)) if s + m < n else 1
else:
s += max(1, j - bad_char.get(text[s + j], -1))
text = input("Enter text: ")
pattern = input("Enter pattern: ")
boyer_moore(text, pattern)
This method skips comparisons using the bad character rule.
3. Method 2: Simplified Version
def boyer_moore_simple(text, pattern):
n = len(text)
m = len(pattern)
for i in range(n - m + 1):
j = m - 1
while j >= 0 and text[i + j] == pattern[j]:
j -= 1
if j == -1:
print("Found at", i)
boyer_moore_simple("ABAAABCD", "ABC")
This version demonstrates the core idea without optimization.
4. Method 3: Return Indices
def boyer_moore_indices(text, pattern):
result = []
n = len(text)
m = len(pattern)
for i in range(n - m + 1):
if text[i:i+m] == pattern:
result.append(i)
return result
print(boyer_moore_indices("abcabcabc", "abc"))
This method returns all match positions.
5. Method 4: Multiple Pattern Search
def search_multiple(text, patterns):
for pattern in patterns:
print(f"Searching for {pattern}:")
for i in range(len(text) - len(pattern) + 1):
if text[i:i+len(pattern)] == pattern:
print("Found at", i)
search_multiple("abcabcabc", ["abc", "cab"])
Boyer-Moore can be adapted for multiple pattern searches.
6. Method 5: Handling Edge Cases
text = input().strip()
pattern = input().strip()
if not pattern:
print("Empty pattern")
elif len(pattern) > len(text):
print("No match")
else:
for i in range(len(text) - len(pattern) + 1):
if text[i:i+len(pattern)] == pattern:
print(i)
Ensures safe handling of special cases.
7. Algorithm
1. Preprocess pattern (bad character table).
2. Align pattern with text.
3. Compare from right to left.
4. Shift pattern using heuristic.
5. Repeat until match or end.
8. Common Mistakes
1. Incorrect shift calculation.
2. Not handling boundary conditions.
3. Ignoring preprocessing step.
4. Confusing with naive approach.
9. Applications
1. Text editors search.
2. DNA sequence analysis.
3. Search engines.
4. Large-scale text processing.
Conclusion
Boyer-Moore is one of the fastest string matching algorithms due to its ability to skip sections of text.
Understanding its heuristics is key to mastering advanced pattern searching.
Codecrown