0

I am looking at example suffix arrays and longest common prefixes, but I do not understand the criteria for how the the suffix array is sorted. I am looking at the example on wikipedia where they use banana$ Can someone please explain how a suffix array is sorted? My first instinct was to sort by length, but that is clearly not the case here.

(Here is the example they used http://en.wikipedia.org/wiki/Suffix_array)

These suffixes can be sorted:

Suffix  i
$       7
a$      6
ana$    4
anana$  2
banana$ 1
na$     5
nana$   3
3
  • 3
    It's just sorted lexicographically (= "dictionary order") Commented Aug 15, 2014 at 5:39
  • There's a python code in that page. Commented Aug 15, 2014 at 5:51
  • I summed up a simple algorithm to do it in another answer some time ago, in case you are interested. Commented Aug 15, 2014 at 5:55

3 Answers 3

0

Here is my code template on acm/icpc, using DA algorithm

/* 
rank[0...7]: 4 6 8 1 2 3 5 7 
string:      a a b a a a a b 
------------------------------------------- 
 sa[1] = 3 : a a a a b         height[1] = 0 
 sa[2] = 4 : a a a b           height[2] = 3 
 sa[3] = 5 : a a b             height[3] = 2 
 sa[4] = 0 : a a b a a a a b   height[4] = 3 
 sa[5] = 6 : a b               height[5] = 1 
 sa[6] = 1 : a b a a a a b     height[6] = 2 
 sa[7] = 7 : b                 height[7] = 0 
 sa[8] = 2 : b a a a a b       height[8] = 1 
*/ 

const int MAXN = 200010;
int r[MAXN],sa[MAXN];
int ua[MAXN],ub[MAXN],uv[MAXN],us[MAXN];
int cmp(int *r,int a,int b,int l)
{return r[a] == r[b] && r[a+l]==r[b+l];}


void da(int *r,int *sa,int n,int m)   
{                                     
    int i,j,p,*x = ua,*y = ub,*t;
    for(i=0; i<m; i++)  us[i] = 0;
    for(i=0; i<n; i++)  us[x[i] = r[i]]++;
    for(i=1; i<m; i++)  us[i] += us[i-1];
    for(i=n-1; i>=0; i--)   sa[--us[x[i]]] = i;
    for(j=1,p=1; p<n; j<<=1,m=p)
    {
        for(p=0,i=n-j; i<n; i++)    y[p++] = i;
        for(i=0; i<n; i++)  if(sa[i]>=j) y[p++]=sa[i]-j;
        for(i=0; i<n; i++)  uv[i] = x[y[i]];
        for(i=0; i<m; i++)  us[i] = 0;
        for(i=0; i<n; i++)  us[uv[i]]++;
        for(i=1; i<m; i++)  us[i]+=us[i-1];
        for(i=n-1; i>=0; i--) sa[--us[uv[i]]] = y[i];
        for(t=x,x=y,y=t,p=1,x[sa[0]]=0,i=1; i<n; i++)
           x[sa[i]] = cmp(y,sa[i-1],sa[i],j)?p-1:p++;
    }
}

int rank[MAXN], height[MAXN];
void calh(int *r, int *sa, int n)
{
    int i, j, k = 0;
    for(i=1; i<=n; ++i) rank[sa[i]] = i;
    for(i=0; i<n; height[rank[i++]] = k)
        for(k?k--:0,j=sa[rank[i]-1]; r[i+k]==r[j+k];k++);
}

the time above is O(nlogn)

There is also time in O(n) You can refer this article: http://www.cs.cmu.edu/~guyb/realworld/papersS04/KaSa03.pdf

Sign up to request clarification or add additional context in comments.

Comments

0

I also struggled to understand suffix array stuff then I found this intuition behind suffix array and how suffix array with LCP works.

Hope it helps! If you need additional explanation, I will provide here.

Comments

0

There is a tutorial on suffix array at http://discuss.codechef.com/questions/21385/a-tutorial-on-suffix-arrays

This tutorial contains things like what is a suffix array? How is it constructed? What is the most efficient way to construct suffix array ? What are its application?

hopefully you will get What you are seeking for..

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.