How to find repeating sequence of characters in a given array?

Question

My problem is to find the repeating sequence of characters in the given array. simply, to identify the pattern in which the characters are appearing.

   .---.---.---.---.---.---.---.---.---.---.---.---.---.---.
1: | J | A | M | E | S | O | N | J | A | M | E | S | O | N |
   '---'---'---'---'---'---'---'---'---'---'---'---'---'---'

   .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
2: | R | O | N | R | O | N | R | O | N | R | O | N | R | O | N |
   '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'

   .---.---.---.---.---.---.---.---.---.---.---.---.
3: | S | H | A | M | I | L | S | H | A | M | I | L |
   '---'---'---'---'---'---'---'---'---'---'---'---'

   .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
4: | C | A | R | P | E | N | T | E | R | C | A | R | P | E | N | T | E | R |
   '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'

Example

Given the previous data, the result should be:

"JAMESON"
"RON"
"SHAMIL"
"CARPENTER"

Question

How to deal with this problem efficiently?

Removed the [aptitude] tag because it's used primarily to refer to the APT client. Also, should this be tagged [language-agnostic] instead of [java] and [c]? — BoltClock
– BoltClock, Commented Sep 9, 2010 at 9:41
Is the array containing exactly the repeated text, or is it larger ? Is the repeated text starting on the first cell or can it start anywhere in the array ? — barjak
– barjak, Commented Sep 9, 2010 at 9:52
Pay attention to stuff like BARBARABARBARABARBARA (repeating BARBARA, not BAR) — pmg
– pmg, Commented Sep 9, 2010 at 9:59
you can look at this Knuth Morris Pratt String Matching Algorithm,which basically detects characters match. — Dead Programmer
– Dead Programmer, Commented Sep 9, 2010 at 10:08

Oliver Charlesworth · Accepted Answer · 2010-09-09 13:02:14Z

26

Tongue-in-cheek O(NlogN) solution

Perform an FFT on your string (treating characters as numeric values). Every peak in the resulting graph corresponds to a substring periodicity.

answered Sep 9, 2010 at 13:02

Oliver Charlesworth

274k34 gold badges591 silver badges687 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Dan Over a year ago

As you mention the FFT, it got me thinking about using a cross-correlation (whichever method is used) to find matches of the substring in the sequence. Normally my caveman brute-force approach, if I couldn't use an off-the-shelf regex library, would be the "walk the sequence, try to match" approach. But your answer got me thinking -- I wonder if/when a cross correlation would be more efficient. Probably depends on the length of the pattern, the length of the sequence to search, etc... but anyway, your answer got me thinking ("out of the box" as Jonathan said). Thanks.

Oliver Charlesworth Over a year ago

Cross-correlation and doing a Fourier transform are effectively the same thing (see en.wikipedia.org/wiki/Convolution_theorem). For anything other than quite small values of N, the FFT will be more efficient.

Philipp Over a year ago

Could you explain why every peak corresponds to a substring periodicity? Unfortunately, I cannot fully grasp the idea. I only know FFT for frequency analysis in music (there, multiple frequencies overlay at the same time; but when analyzing text we've got a straight sequence of characters). How does this match up?

Péter Török · Accepted Answer · 2010-09-09 10:02:50Z

19

For your examples, my first approach would be to

get the first character of the array (for your last example, that would be C)
get the index of the next appearance of that character in the array (e.g. 9)
if it is found, search for the next appearance of the substring between the two appearances of the character (in this case CARPENTER)
if it is found, you're done (and the result is this substring).

Of course, this works only for a very limited subset of possible arrays, where the same word is repeated over and over again, starting from the beginning, without stray characters in between, and its first character is not repeated within the word. But all your examples fall into this category - and I prefer the simplest solution which could possibly work :-)

If the repeated word contains the first character multiple times (e.g. CACTUS), the algorithm can be extended to look for subsequent occurrences of that character too, not only the first one (so that it finds the whole repeated word, not only a substring of it).

Note that this extended algorithm would give a different result for your second example, namely RONRON instead of RON.

edited Sep 9, 2010 at 10:02

answered Sep 9, 2010 at 9:47

Péter Török

117k31 gold badges277 silver badges332 bronze badges

4 Comments

Eyal Schneider Over a year ago

+1 for simplicity and linear time solution. However, I understood the problem differently. I guess that the question should specify whether characters can repeat in the pattern, and whether we are looking for the largest or smallest pattern that repeats itself.

Jonathan Over a year ago

assuming the word never repeats the first letter is pretty much throwing up your hands and going home.

Péter Török Over a year ago

@Jonathan, in case you haven't noticed, I actually describe how to deal with that case :-)

Péter Török Over a year ago

@sagivo, care to explain why you think so?

Marcelo Cantos · Accepted Answer · 2010-09-09 09:51:08Z

6

In Python, you can leverage regexes thus:

def recurrence(text):
    import re
    for i in range(1, len(text)/2 + 1):
        m = re.match(r'^(.{%d})\1+$'%i, text)
        if m: return m.group(1)

recurrence('abcabc') # Returns 'abc'

I'm not sure how this would translate to Java or C. (That's one of the reasons I like Python, I guess. :-)

answered Sep 9, 2010 at 9:51

Marcelo Cantos

187k40 gold badges338 silver badges366 bronze badges

Comments

fastcodejava · Accepted Answer · 2010-09-09 09:58:57Z

2

First write a method that find repeating substring sub in the container string as below.

boolean findSubRepeating(String sub, String container);

Now keep calling this method with increasing substring in the container, first try 1 character substring, then 2 characters, etc going upto container.length/2.

answered Sep 9, 2010 at 9:58

fastcodejava

41.3k31 gold badges142 silver badges191 bronze badges

Comments

Erich Kitzmueller · Accepted Answer · 2010-09-09 09:49:55Z

1

Pseudocode

len = str.length
for (i in 1..len) {
   if (len%i==0) {
      if (str==str.substr(0,i).repeat(len/i)) {
         return str.substr(0,i)
      }
   }
}

Note: For brevity, I'm inventing a "repeat" method for strings, which isn't actually part of Java's string; "abc".repeat(2)="abcabc"

answered Sep 9, 2010 at 9:49

Erich Kitzmueller

37.1k5 gold badges85 silver badges104 bronze badges

Comments

Asha · Accepted Answer · 2010-09-09 10:02:04Z

1

Using C++:

//Splits the string into the fragments of given size
//Returns the set of of splitted strings avaialble
set<string> split(string s, int frag)
{
    set<string> uni;
    int len = s.length();
    for(int i = 0; i < len; i+= frag)
    {
        uni.insert(s.substr(i, frag));
    }

    return uni;
}

int main()
{

    string out;
    string s = "carpentercarpenter";
    int len = s.length();

      //Optimistic approach..hope there are only 2 repeated strings
      //If that fails, then try to break the strings with lesser number of
      //characters
    for(int i = len/2; i>1;--i)
    {
        set<string> uni = split(s,i);
        if(uni.size() == 1)
        {
            out = *uni.begin();
            break;
        }
    }

    cout<<out;
    return 0;

}

edited Sep 9, 2010 at 10:02

answered Sep 9, 2010 at 9:56

Asha

11.3k6 gold badges46 silver badges68 bronze badges

Comments

Eyal Schneider · Accepted Answer · 2010-09-09 11:25:41Z

1

The first idea that comes to my mind is trying all repeating sequences of lengths that divide length(S) = N. There is a maximum of N/2 such lengths, so this results in a O(N^2) algorithm.

But i'm sure it can be improved...

edited Sep 9, 2010 at 11:25

answered Sep 9, 2010 at 10:22

Eyal Schneider

22.5k5 gold badges51 silver badges79 bronze badges

Comments

Rogan Dawes · Accepted Answer · 2017-06-24 14:07:30Z

Here is a more general solution to the problem, that will find repeating subsequences within an sequence (of anything), where the subsequences do not have to start at the beginning, nor immediately follow each other.

given an sequence b[0..n], containing the data in question, and a threshold t being the minimum subsequence length to find,

l_max = 0, i_max = 0, j_max = 0;
for (i=0; i<n-(t*2);i++) {
  for (j=i+t;j<n-t; j++) {
    l=0;
    while (i+l<j && j+l<n && b[i+l] == b[j+l])
      l++;
    if (l>t) {
      print "Sequence of length " + l + " found at " + i + " and " + j);
      if (l>l_max) {
        l_max = l;
        i_max = i;
        j_max = j;
      }
    }
  }
}
if (l_max>t) {
  print "longest common subsequence found at " + i_max + " and " + j_max + " (" + l_max + " long)";
}

Basically:

Start at the beginning of the data, iterate until within 2*t of the end (no possible way to have two distinct subsequences of length t in less than 2*t of space!)
For the second subsequence, start at least t bytes beyond where the first sequence begins.
Then, reset the length of the discovered subsequence to 0, and check to see if you have a common character at i+l and j+l. As long as you do, increment l. When you no longer have a common character, you have reached the end of your common subsequence. If the subsequence is longer than your threshold, print the result.

BurnsBA · Accepted Answer · 2020-11-19 20:16:46Z

Just figured this out myself and wrote some code for this (written in C#) with a lot of comments. Hope this helps someone:

// Check whether the string contains a repeating sequence.
public static bool ContainsRepeatingSequence(string str)
{
    if (string.IsNullOrEmpty(str)) return false;

    for (int i=0; i<str.Length; i++)
    {
        // Every iteration, cut down the string from i to the end.
        string toCheck = str.Substring(i);

        // Set N equal to half the length of the substring. At most, we have to compare half the string to half the string. If the string length is odd, the last character will not be checked against, but it will be checked in the next iteration.
        int N = toCheck.Length / 2;

        // Check strings of all lengths from 1 to N against the subsequent string of length 1 to N.
        for (int j=1; j<=N; j++)
        {
            // Check from beginning to j-1, compare against j to j+j.
            if (toCheck.Substring(0, j) == toCheck.Substring(j, j)) return true;
        }
    }

    return false;
}

Feel free to ask any questions if it's unclear why it works.

user411313 · Accepted Answer · 2010-09-09 16:20:16Z

0

and here is a concrete working example:

/* find greatest repeated substring */
char *fgrs(const char *s,size_t *l)
{
  char *r=0,*a=s;
  *l=0;
  while( *a )
  {
    char *e=strrchr(a+1,*a);
    if( !e )
      break;
    do {
      size_t t=1;
      for(;&a[t]!=e && a[t]==e[t];++t);
      if( t>*l )
        *l=t,r=a;
      while( --e!=a && *e!=*a );
    } while( e!=a && *e==*a );
    ++a;
  }
  return r;
}

  size_t t;
  const char *p;
  p=fgrs("BARBARABARBARABARBARA",&t);
  while( t-- ) putchar(*p++);
  p=fgrs("0123456789",&t);
  while( t-- ) putchar(*p++);
  p=fgrs("1111",&t);
  while( t-- ) putchar(*p++);
  p=fgrs("11111",&t);
  while( t-- ) putchar(*p++);

answered Sep 9, 2010 at 16:20

user411313

4,00021 silver badges16 bronze badges

1 Comment

pmg Over a year ago

oops p=fgrs("BARBARABARBARAB-RBARA", &t);

user85421 · Accepted Answer · 2010-09-16 15:22:53Z

0

Not sure how you define "efficiently". For easy/fast implementation you could do this in Java:

    private static String findSequence(String text) {
        Pattern pattern = Pattern.compile("(.+?)\\1+");
        Matcher matcher = pattern.matcher(text);
        return matcher.matches() ? matcher.group(1) : null;
    }

it tries to find the shortest string (.+?) that must be repeated at least once (\1+) to match the entire input text.

edited Sep 16, 2010 at 15:22

answered Sep 16, 2010 at 15:16

user85421

29.8k11 gold badges66 silver badges96 bronze badges

Comments

Muhammad Dyas Yaskur · Accepted Answer · 2020-02-02 09:13:35Z

This is a solution I came up with using the queue, it passed all the test cases of a similar problem in codeforces. Problem No is 745A.

#include<bits/stdc++.h>
using namespace std;
typedef long long ll;

int main()
{
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    string s, s1, s2; cin >> s; queue<char> qu; qu.push(s[0]); bool flag = true; int ind = -1;
    s1 = s.substr(0, s.size() / 2);
    s2 = s.substr(s.size() / 2);
    if(s1 == s2)
    {
        for(int i=0; i<s1.size(); i++)
        {
            s += s1[i];
        }
    }
    //cout << s1 << " " << s2 << " " << s << "\n";
    for(int i=1; i<s.size(); i++)
    {
        if(qu.front() == s[i]) {qu.pop();}
        qu.push(s[i]);
    }
    int cycle = qu.size();

    /*queue<char> qu2 = qu; string str = "";
    while(!qu2.empty())
    {
        cout << qu2.front() << " ";
        str += qu2.front();
        qu2.pop();
    }*/


    while(!qu.empty())
    {
        if(s[++ind] != qu.front()) {flag = false; break;}
        qu.pop();
    }
    flag == true ? cout << cycle : cout << s.size();
    return 0;
}

Taufeeq · Accepted Answer · 2023-08-16 13:23:47Z

0

def pattern(y):  #y = sequence list
     a=y[0]
     s=0 #start
     e=0 #end
     for i in range(len(y)):
        if y[i]==a:
        for j in range(len(y)):
            if y[:j]==y[i:i+j]:
            s,e=i,i+j
            continue
     if e==len(y)-1 and s==0:
        return "No repeating sequence found"
     else:
        return s,e,e-s   #period e-s

you can use this code. This works in two cases - 1.whole sequence is periodic 2. if two or more than two sequence got repeated in sequence. it will return you start and end point of repeating sequence if available.

edited Aug 16, 2023 at 13:23

answered Aug 16, 2023 at 13:17

Taufeeq

12 bronze badges

Comments

manolowar · Accepted Answer · 2010-09-16 14:20:07Z

-1

I'd convert the array to a String object and use regex

answered Sep 16, 2010 at 14:20

manolowar

7,0125 gold badges26 silver badges18 bronze badges

1 Comment

Intoxicated Penguin Over a year ago

This is NOT an answer! Edit: I see this is old! I hope you have bettered yourself in this stretch of time!

Daniel · Accepted Answer · 2013-08-22 23:10:29Z

-1

Put all your character in an array e.x. a[]

i=0; j=0;
for( 0 < i < count ) 
{
if (a[i] == a[i+j+1])
    {++i;}
else
    {++j;i=0;}
}

Then the ratio of (i/j) = repeat count in your array. You must pay attention to limits of i and j, but it is the simple solution.

edited Aug 22, 2013 at 23:10

Daniel

19.6k7 gold badges64 silver badges74 bronze badges

answered Aug 22, 2013 at 22:53

user2617898

1

Collectives™ on Stack Overflow

How to find repeating sequence of characters in a given array?

Example

Question

15 Answers 15

3 Comments

4 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Example

Question

15 Answers 15

3 Comments

4 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related