Python Program for Aho-Corasick Algorithm

The Aho-Corasick algorithm is an efficient string searching algorithm used to find multiple patterns simultaneously in a text.

It combines trie data structure with failure links to achieve fast searching.

1. Understanding the Problem

Given a text and multiple patterns, find all occurrences of each pattern in the text.

Input: text = "ahishers", patterns = ["he", "she", "his", "hers"]
Output: his at 1, she at 3, he at 4, hers at 4

2. Method 1: Build Trie

Python
Trie construction
class TrieNode:
    def __init__(self):
        self.children = {}
        self.output = []
        self.fail = None


def build_trie(patterns):
    root = TrieNode()
    for pattern in patterns:
        node = root
        for char in pattern:
            node = node.children.setdefault(char, TrieNode())
        node.output.append(pattern)
    return root

Trie stores all patterns efficiently.

3. Method 2: Build Failure Links

Python
Failure link construction
from collections import deque


def build_failure_links(root):
    queue = deque()

    for child in root.children.values():
        child.fail = root
        queue.append(child)

    while queue:
        current = queue.popleft()

        for char, child in current.children.items():
            fail = current.fail
            while fail and char not in fail.children:
                fail = fail.fail
            child.fail = fail.children[char] if fail and char in fail.children else root
            child.output += child.fail.output
            queue.append(child)

Failure links allow efficient transitions when mismatches occur.

4. Method 3: Search Patterns

Python
Search in text
def search(text, root):
    node = root
    results = []

    for i, char in enumerate(text):
        while node and char not in node.children:
            node = node.fail
        if not node:
            node = root
            continue
        node = node.children[char]

        for pattern in node.output:
            results.append((pattern, i - len(pattern) + 1))

    return results

patterns = ["he", "she", "his", "hers"]
text = "ahishers"
root = build_trie(patterns)
build_failure_links(root)
print(search(text, root))

This method finds all pattern matches in a single pass.

5. Method 4: Combined Function

Python
Complete implementation
def aho_corasick(text, patterns):
    root = build_trie(patterns)
    build_failure_links(root)
    return search(text, root)

print(aho_corasick("ahishers", ["he", "she", "his", "hers"]))

Encapsulates the full algorithm into one function.

6. Method 5: Handling Edge Cases

Python
Edge cases
text = input().strip()
patterns = input().split()

if not text or not patterns:
    print([])
else:
    root = build_trie(patterns)
    build_failure_links(root)
    print(search(text, root))

Handles empty input and invalid cases.

7. Algorithm

1. Build trie from patterns.

2. Create failure links using BFS.

3. Traverse text character by character.

4. Follow trie transitions.

5. Use failure links on mismatch.

6. Record matches.

8. Common Mistakes

1. Incorrect failure link setup.

2. Not merging output lists.

3. Infinite loops on failure traversal.

4. Ignoring edge cases.

9. Applications

1. Spam filtering.

2. Intrusion detection systems.

3. Search engines.

4. DNA sequence matching.

Conclusion

Aho-Corasick is one of the most powerful algorithms for multiple pattern matching.

By combining trie and failure links, it achieves linear time complexity for searching multiple patterns.