Python Program to Build Suffix Array
A suffix array is a sorted array of all suffixes of a string. It is widely used in advanced string processing and searching algorithms.
Suffix arrays are memory-efficient alternatives to suffix trees and are used in many real-world applications.
1. Understanding the Problem
Given a string, construct its suffix array (sorted indices of suffixes).
Input: banana Suffixes: banana, anana, nana, ana, na, a Sorted: a, ana, anana, banana, na, nana Output: [5, 3, 1, 0, 4, 2]
2. Method 1: Naive Approach
def build_suffix_array(s):
suffixes = [(s[i:], i) for i in range(len(s))]
suffixes.sort()
return [index for (suffix, index) in suffixes]
print(build_suffix_array("banana"))
This method generates all suffixes and sorts them directly.
3. Method 2: Using Sorted Indices
def suffix_array(s):
return sorted(range(len(s)), key=lambda i: s[i:])
print(suffix_array("banana"))
This method avoids storing full suffix strings explicitly.
4. Method 3: Using Function
def get_suffix_array(text):
suffixes = []
for i in range(len(text)):
suffixes.append((text[i:], i))
suffixes.sort()
return [i[1] for i in suffixes]
print(get_suffix_array("abracadabra"))
Encapsulates logic into a reusable function.
5. Method 4: Build LCP Array
def build_lcp(s, sa):
n = len(s)
lcp = [0] * n
for i in range(1, n):
l = 0
x, y = sa[i], sa[i-1]
while x + l < n and y + l < n and s[x+l] == s[y+l]:
l += 1
lcp[i] = l
return lcp
s = "banana"
sa = sorted(range(len(s)), key=lambda i: s[i:])
print(build_lcp(s, sa))
LCP array helps in many advanced string algorithms.
6. Method 5: Handling Edge Cases
s = input().strip()
if not s:
print([])
elif len(s) == 1:
print([0])
else:
print(sorted(range(len(s)), key=lambda i: s[i:]))
Handles empty and single-character strings safely.
7. Algorithm
1. Generate all suffixes of string.
2. Sort suffixes lexicographically.
3. Store starting indices.
4. Return suffix array.
8. Common Mistakes
1. Confusing suffix array with suffix tree.
2. High memory usage in naive method.
3. Incorrect sorting logic.
4. Ignoring edge cases.
9. Applications
1. Fast substring search.
2. Data compression (Burrows-Wheeler Transform).
3. Bioinformatics (DNA matching).
4. Pattern matching algorithms.
Conclusion
Suffix arrays are powerful tools for solving advanced string problems efficiently.
Although the naive approach is simple, optimized methods are preferred for large-scale applications.
Codecrown