Python Program to Build Suffix Array

A suffix array is a sorted array of all suffixes of a string. It is widely used in advanced string processing and searching algorithms.

Suffix arrays are memory-efficient alternatives to suffix trees and are used in many real-world applications.

1. Understanding the Problem

Given a string, construct its suffix array (sorted indices of suffixes).

Input: banana
Suffixes: banana, anana, nana, ana, na, a
Sorted: a, ana, anana, banana, na, nana
Output: [5, 3, 1, 0, 4, 2]

2. Method 1: Naive Approach

Python
Simple sorting
def build_suffix_array(s):
    suffixes = [(s[i:], i) for i in range(len(s))]
    suffixes.sort()
    return [index for (suffix, index) in suffixes]

print(build_suffix_array("banana"))

This method generates all suffixes and sorts them directly.

3. Method 2: Using Sorted Indices

Python
Sort indices by suffix
def suffix_array(s):
    return sorted(range(len(s)), key=lambda i: s[i:])

print(suffix_array("banana"))

This method avoids storing full suffix strings explicitly.

4. Method 3: Using Function

Python
Reusable function
def get_suffix_array(text):
    suffixes = []
    for i in range(len(text)):
        suffixes.append((text[i:], i))
    suffixes.sort()
    return [i[1] for i in suffixes]

print(get_suffix_array("abracadabra"))

Encapsulates logic into a reusable function.

5. Method 4: Build LCP Array

Python
Longest Common Prefix
def build_lcp(s, sa):
    n = len(s)
    lcp = [0] * n
    
    for i in range(1, n):
        l = 0
        x, y = sa[i], sa[i-1]
        while x + l < n and y + l < n and s[x+l] == s[y+l]:
            l += 1
        lcp[i] = l
    
    return lcp

s = "banana"
sa = sorted(range(len(s)), key=lambda i: s[i:])
print(build_lcp(s, sa))

LCP array helps in many advanced string algorithms.

6. Method 5: Handling Edge Cases

Python
Edge cases
s = input().strip()

if not s:
    print([])
elif len(s) == 1:
    print([0])
else:
    print(sorted(range(len(s)), key=lambda i: s[i:]))

Handles empty and single-character strings safely.

7. Algorithm

1. Generate all suffixes of string.

2. Sort suffixes lexicographically.

3. Store starting indices.

4. Return suffix array.

8. Common Mistakes

1. Confusing suffix array with suffix tree.

2. High memory usage in naive method.

3. Incorrect sorting logic.

4. Ignoring edge cases.

9. Applications

1. Fast substring search.

2. Data compression (Burrows-Wheeler Transform).

3. Bioinformatics (DNA matching).

4. Pattern matching algorithms.

Conclusion

Suffix arrays are powerful tools for solving advanced string problems efficiently.

Although the naive approach is simple, optimized methods are preferred for large-scale applications.