Python Program for Boyer-Moore Algorithm

The Boyer-Moore algorithm is one of the most efficient string searching algorithms. It skips sections of the text, making it faster than naive approaches.

It uses heuristics like the bad character rule and good suffix rule to optimize searching.

1. Understanding the Problem

Given a text and a pattern, find all occurrences of the pattern in the text efficiently.

Input: text = "HERE IS A SIMPLE EXAMPLE", pattern = "EXAMPLE"
Output: Found at index 17

2. Method 1: Bad Character Heuristic

              
            
Python

          Bad character rule
        
def bad_char_heuristic(pattern):
    bad_char = {}
    for i in range(len(pattern)):
        bad_char[pattern[i]] = i
    return bad_char


def boyer_moore(text, pattern):
    bad_char = bad_char_heuristic(pattern)
    n = len(text)
    m = len(pattern)
    s = 0

    while s <= n - m:
        j = m - 1

        while j >= 0 and pattern[j] == text[s + j]:
            j -= 1

        if j < 0:
            print("Pattern found at index", s)
            s += (m - bad_char.get(text[s + m], -1)) if s + m < n else 1
        else:
            s += max(1, j - bad_char.get(text[s + j], -1))

text = input("Enter text: ")
pattern = input("Enter pattern: ")
boyer_moore(text, pattern)

This method skips comparisons using the bad character rule.

3. Method 2: Simplified Version

              
            
Python

          Simplified implementation
        
def boyer_moore_simple(text, pattern):
    n = len(text)
    m = len(pattern)
    
    for i in range(n - m + 1):
        j = m - 1
        while j >= 0 and text[i + j] == pattern[j]:
            j -= 1
        if j == -1:
            print("Found at", i)

boyer_moore_simple("ABAAABCD", "ABC")

This version demonstrates the core idea without optimization.

4. Method 3: Return Indices

              
            
Python

          Return results
        
def boyer_moore_indices(text, pattern):
    result = []
    n = len(text)
    m = len(pattern)

    for i in range(n - m + 1):
        if text[i:i+m] == pattern:
            result.append(i)

    return result

print(boyer_moore_indices("abcabcabc", "abc"))

This method returns all match positions.

5. Method 4: Multiple Pattern Search

              
            
Python

          Multiple patterns
        
def search_multiple(text, patterns):
    for pattern in patterns:
        print(f"Searching for {pattern}:")
        for i in range(len(text) - len(pattern) + 1):
            if text[i:i+len(pattern)] == pattern:
                print("Found at", i)

search_multiple("abcabcabc", ["abc", "cab"])

Boyer-Moore can be adapted for multiple pattern searches.

6. Method 5: Handling Edge Cases

              
            
Python

          Edge cases
        
text = input().strip()
pattern = input().strip()

if not pattern:
    print("Empty pattern")
elif len(pattern) > len(text):
    print("No match")
else:
    for i in range(len(text) - len(pattern) + 1):
        if text[i:i+len(pattern)] == pattern:
            print(i)

Ensures safe handling of special cases.

7. Algorithm

1. Preprocess pattern (bad character table).

2. Align pattern with text.

3. Compare from right to left.

4. Shift pattern using heuristic.

5. Repeat until match or end.

8. Common Mistakes

1. Incorrect shift calculation.

2. Not handling boundary conditions.

3. Ignoring preprocessing step.

4. Confusing with naive approach.

9. Applications

1. Text editors search.

2. DNA sequence analysis.

3. Search engines.

4. Large-scale text processing.

Conclusion

Boyer-Moore is one of the fastest string matching algorithms due to its ability to skip sections of text.

Understanding its heuristics is key to mastering advanced pattern searching.