Python Program for Aho-Corasick Algorithm
The Aho-Corasick algorithm is an efficient string searching algorithm used to find multiple patterns simultaneously in a text.
It combines trie data structure with failure links to achieve fast searching.
1. Understanding the Problem
Given a text and multiple patterns, find all occurrences of each pattern in the text.
Input: text = "ahishers", patterns = ["he", "she", "his", "hers"] Output: his at 1, she at 3, he at 4, hers at 4
2. Method 1: Build Trie
class TrieNode:
def __init__(self):
self.children = {}
self.output = []
self.fail = None
def build_trie(patterns):
root = TrieNode()
for pattern in patterns:
node = root
for char in pattern:
node = node.children.setdefault(char, TrieNode())
node.output.append(pattern)
return root
Trie stores all patterns efficiently.
3. Method 2: Build Failure Links
from collections import deque
def build_failure_links(root):
queue = deque()
for child in root.children.values():
child.fail = root
queue.append(child)
while queue:
current = queue.popleft()
for char, child in current.children.items():
fail = current.fail
while fail and char not in fail.children:
fail = fail.fail
child.fail = fail.children[char] if fail and char in fail.children else root
child.output += child.fail.output
queue.append(child)
Failure links allow efficient transitions when mismatches occur.
4. Method 3: Search Patterns
def search(text, root):
node = root
results = []
for i, char in enumerate(text):
while node and char not in node.children:
node = node.fail
if not node:
node = root
continue
node = node.children[char]
for pattern in node.output:
results.append((pattern, i - len(pattern) + 1))
return results
patterns = ["he", "she", "his", "hers"]
text = "ahishers"
root = build_trie(patterns)
build_failure_links(root)
print(search(text, root))
This method finds all pattern matches in a single pass.
5. Method 4: Combined Function
def aho_corasick(text, patterns):
root = build_trie(patterns)
build_failure_links(root)
return search(text, root)
print(aho_corasick("ahishers", ["he", "she", "his", "hers"]))
Encapsulates the full algorithm into one function.
6. Method 5: Handling Edge Cases
text = input().strip()
patterns = input().split()
if not text or not patterns:
print([])
else:
root = build_trie(patterns)
build_failure_links(root)
print(search(text, root))
Handles empty input and invalid cases.
7. Algorithm
1. Build trie from patterns.
2. Create failure links using BFS.
3. Traverse text character by character.
4. Follow trie transitions.
5. Use failure links on mismatch.
6. Record matches.
8. Common Mistakes
1. Incorrect failure link setup.
2. Not merging output lists.
3. Infinite loops on failure traversal.
4. Ignoring edge cases.
9. Applications
1. Spam filtering.
2. Intrusion detection systems.
3. Search engines.
4. DNA sequence matching.
Conclusion
Aho-Corasick is one of the most powerful algorithms for multiple pattern matching.
By combining trie and failure links, it achieves linear time complexity for searching multiple patterns.
Codecrown