Find the number of repeated subarray in an array

Question

There is an array indexed from 0-n (i.e. size=n) containing elements from 0-m where m < n (assume m is 100 or 1000 times less than n i.e. m is much less than n), so many of the elements or sub array must be repeated and we have to find the number of such repeated sub array of size 1 or more than 1. For example, consider an array
1 2 4 1 2 4 1 5 6 3 5, here
1 is repeated 2 times
2 is repeated 1 time
4 is repeated 1 time
5 is repeated 1 time
1 2 is repeated 1 time
2 4 is repeated 1 time
4 1 is repeated 1 time
1 2 4 is repeated 1 time
2 4 1 is repeated 1 time
1 2 4 1 is repeated 1 time
So the final answer is sum of these i.e. 11
Can anybody please suggest some data structure or an efficient algorithm to solve this. I was thinking of hashing m and noting the index value where it occurs but didn't formalize it.
Assume n,m < 10^5

I can give you a "pattern": Floor[12412415635 / (10 ^ (10 - n))] mod 10. It works, but would that be an acceptable pattern for you? Probably not. You need to specify what exactly you consider a "pattern". Unless you can give a specification, you can't implement this. — phant0m
– phant0m, Commented Jul 14, 2013 at 10:50
@phant0m I have edited the question . Hope it is clear now. What I meant is counting repeated sub array — justice league
– justice league, Commented Jul 14, 2013 at 10:56
You can probably use a suffix array for this, but I'm not sure what the algorithm would be exactly. — user555045
– user555045, Commented Jul 14, 2013 at 11:03
"So the final answer is sum of these i.e. 8" Why 8? 241 is also repeated once. So it would be a sum of 9. Or am I wrong? — Fabian Bigler
– Fabian Bigler, Commented Jul 14, 2013 at 11:07

Dave · Accepted Answer · 2013-07-14 14:14:21Z

4

Create a hash that has integers as keys, and extendible lists (linked lists, e.g.) of integers as values.
Read through your input list. Treat each input element being scanned as a key; append the index of the following element to the associated list of values.
Now, your repeated 1-element count is n - |keys|
Next, go through your keys. For each key, treat the indexed elements of the original list as a new input list. Create a new hash and continue as in step 1. Note that when we store indices we continue to use those of the original list.

Example: 1 2 4 1 2 4 1 5 6 3 5 (assume we're zero-indexed). Here, n=11.

1 -> [1, 4, 7]
2 -> [2, 5]
3 -> [10]
4 -> [3, 6]
5 -> [8, nil]
6 -> [9]

The number of repeated 1-elts is 11 - 6 = 5.

Now, we go through the keys. We start with 1. The index list is [1, 4, 7] which corresponds with elements [2, 2, 5].

2 -> [2, 5]
5 -> [8]

This picks up 3 - 2 = 1 duplicate 2-element list.

etc...

Running time: Linear in the input + output

answered Jul 14, 2013 at 14:14

Dave

9,1603 gold badges30 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

lsk Over a year ago

This is an innovative approach. For the non-1 elements, if the indices are not next to each other, then they are not part of a pattern! Very nice!

TooTone · Accepted Answer · 2013-07-14 16:39:13Z

I used an approach based on suffix trees, from Skienna's book on the design of algorithms. Suffix trees are a special kind of trie tree, where you insert every suffix of the given string into the trie tree. Trie trees are a way of storing multiple words in a single tree: the root is the null string, and each edge adds a character. I have used a very simple implementation as a proof of concept, below.

#include <iostream>
#include <string>
using std::string;
using std::cout;

class Node
{
    Node* m_descendants[10];
    int m_count;
public:
    Node()
    {
        for(int i=0; i<10; ++i)  { m_descendants[i] = nullptr; }
        m_count = 0;
    }
    void Add(const char* str) {
        // Increment number of times we've reached this node
        m_count++;

        const int firstDigit = str[0] - '0';
        if (!(0 <= firstDigit && firstDigit <= 9)) return; // recursion base case 

        // Access descendant node correponding to current digit
        Node*& descendant = m_descendants[firstDigit];
        if (descendant == 0) { // create descendant if necessary
            descendant = new Node;
        }

        // Recurse on rest of string
        descendant->Add(str +1);
    }
    void Print(const string& str, int countAdj=0) const // debug function
    {
        for(int nextDigit=0; nextDigit<10; ++nextDigit) {
            const Node* node = m_descendants[nextDigit];
            if (node) {
                const int adjustedCount = node->Count() - countAdj;
                if (adjustedCount > 0) {
                    char c = '0' + nextDigit;
                    string strWithC = str;
                    strWithC += c;
                    cout << strWithC << ": " << adjustedCount << "\n";
                    node->Print(strWithC, countAdj);
                }
            }
        }
    }
    int SumRepeated() const
    {
        int sumRepeated = 0;
        for(int nextDigit=0; nextDigit<10; ++nextDigit) {
            const Node* node = m_descendants[nextDigit];
            if (node) {
                const int repeatCount = node->Count() -1;
                if (repeatCount > 0) {
                    sumRepeated += repeatCount;
                    sumRepeated += node->SumRepeated();
                }
            }
        }
        return sumRepeated;
    }
    int Count() const { return m_count; }
};

int main(int argc, const char* argv[])
{
    // Construct
    Node* const root = new Node;
    for(const char* str = "12412415635"; *str; ++str) {
        root->Add(str);
    }

    // Print
    root->Print(string(), 1);

    cout << "Sum repeated is " << root->SumRepeated() << "\n";

    return 0; // (nodes are leaked)
}

Output

1: 2
12: 1
124: 1
1241: 1
2: 1
24: 1
241: 1
4: 1
41: 1
5: 1
Sum repeated is 11

Note there is one additional repeated substring here, i.e. 1241.

As I said, this is proof of concept, so, for example, you'd want to use a dictionary instead of an array to store descendants, with larger m. Also, this implementation is not optimal even in terms of the overall algorithm: it's O(n^2) in time and space. You can optimize by collapsing paths with no branching to get O(n) space, and use clever construction algorithms to get O(n) time. Also, as pointed out in one of the other answers, you don't need to process sub-strings that are up to half the length of the original.

גלעד ברקן · Accepted Answer · 2013-07-15 02:33:59Z

Here's something in JavaScript: (output is more or less immediate and JavaScript is one of the slowest, I think):

<script>

function f (s, m) {
  var i = 0, n = s.length, c = 0, H = {}, Hi = {}
  while (i < n-1) {
    for (var j=i+1; j<=Math.min (i + 1 + m,n - 1); j++) {
      if (s[i] == s[j]) {
        var i_= i, j_= j, tmp = ''
        while (s[i_] == s[j_] && i_ < j) {
          tmp += String (s[i_])
          i_++
          j_++
        }
        var k = tmp.length - 1
        while (k >= 0) {
          if (!H[tmp]) {
            H[tmp] = 1
            Hi[tmp] = i
            c++
          }
          else if (i == Hi[tmp]) {
            H[tmp]++
            c++
          }
          tmp = tmp.substr(0,k--)
        }
      }
    }
    i++
  }
  var t1 = ''
  for (var i in H) t1 += i + ", "
  return [t1,c]
}

var a1 = [1,2,4,1,2,4,1,5,6,3,5],
    a2 = []

for (var i=0;i<100000;i++) a2.push(Math.floor(Math.random()*10))

console.log(a1 +"\n"+ f(a1,10))
console.log(f(a2,10))

</script>

Output:

1,2,4,1,2,4,1,5,6,3,5
1, 2, 4, 5, 12, 24, 41, 124, 241, ,10

["0...6358, 3582, 038, 4304, 2068, 095, 
 9252, 37866, 3786, 7866, 7893, 8701, 0669, ", 771]

Collectives™ on Stack Overflow

Find the number of repeated subarray in an array

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related