7

There is an array indexed from 0-n (i.e. size=n) containing elements from 0-m where m < n (assume m is 100 or 1000 times less than n i.e. m is much less than n), so many of the elements or sub array must be repeated and we have to find the number of such repeated sub array of size 1 or more than 1. For example, consider an array
1 2 4 1 2 4 1 5 6 3 5, here
1 is repeated 2 times
2 is repeated 1 time
4 is repeated 1 time
5 is repeated 1 time
1 2 is repeated 1 time
2 4 is repeated 1 time
4 1 is repeated 1 time
1 2 4 is repeated 1 time
2 4 1 is repeated 1 time
1 2 4 1 is repeated 1 time
So the final answer is sum of these i.e. 11
Can anybody please suggest some data structure or an efficient algorithm to solve this. I was thinking of hashing m and noting the index value where it occurs but didn't formalize it.
Assume n,m < 10^5

10
  • I can give you a "pattern": Floor[12412415635 / (10 ^ (10 - n))] mod 10. It works, but would that be an acceptable pattern for you? Probably not. You need to specify what exactly you consider a "pattern". Unless you can give a specification, you can't implement this. Commented Jul 14, 2013 at 10:50
  • @phant0m I have edited the question . Hope it is clear now. What I meant is counting repeated sub array Commented Jul 14, 2013 at 10:56
  • You can probably use a suffix array for this, but I'm not sure what the algorithm would be exactly. Commented Jul 14, 2013 at 11:03
  • "So the final answer is sum of these i.e. 8" Why 8? 241 is also repeated once. So it would be a sum of 9. Or am I wrong? Commented Jul 14, 2013 at 11:07
  • 1
    @FabianBigler Oops I missed that :) Commented Jul 14, 2013 at 11:08

3 Answers 3

4
  1. Create a hash that has integers as keys, and extendible lists (linked lists, e.g.) of integers as values.

  2. Read through your input list. Treat each input element being scanned as a key; append the index of the following element to the associated list of values.

  3. Now, your repeated 1-element count is n - |keys|

  4. Next, go through your keys. For each key, treat the indexed elements of the original list as a new input list. Create a new hash and continue as in step 1. Note that when we store indices we continue to use those of the original list.

Example: 1 2 4 1 2 4 1 5 6 3 5 (assume we're zero-indexed). Here, n=11.

  • 1 -> [1, 4, 7]
  • 2 -> [2, 5]
  • 3 -> [10]
  • 4 -> [3, 6]
  • 5 -> [8, nil]
  • 6 -> [9]

The number of repeated 1-elts is 11 - 6 = 5.

Now, we go through the keys. We start with 1. The index list is [1, 4, 7] which corresponds with elements [2, 2, 5].

  • 2 -> [2, 5]
  • 5 -> [8]

This picks up 3 - 2 = 1 duplicate 2-element list.

etc...

Running time: Linear in the input + output

Sign up to request clarification or add additional context in comments.

1 Comment

This is an innovative approach. For the non-1 elements, if the indices are not next to each other, then they are not part of a pattern! Very nice!
2

I used an approach based on suffix trees, from Skienna's book on the design of algorithms. Suffix trees are a special kind of trie tree, where you insert every suffix of the given string into the trie tree. Trie trees are a way of storing multiple words in a single tree: the root is the null string, and each edge adds a character. I have used a very simple implementation as a proof of concept, below.

#include <iostream>
#include <string>
using std::string;
using std::cout;

class Node
{
    Node* m_descendants[10];
    int m_count;
public:
    Node()
    {
        for(int i=0; i<10; ++i)  { m_descendants[i] = nullptr; }
        m_count = 0;
    }
    void Add(const char* str) {
        // Increment number of times we've reached this node
        m_count++;

        const int firstDigit = str[0] - '0';
        if (!(0 <= firstDigit && firstDigit <= 9)) return; // recursion base case 

        // Access descendant node correponding to current digit
        Node*& descendant = m_descendants[firstDigit];
        if (descendant == 0) { // create descendant if necessary
            descendant = new Node;
        }

        // Recurse on rest of string
        descendant->Add(str +1);
    }
    void Print(const string& str, int countAdj=0) const // debug function
    {
        for(int nextDigit=0; nextDigit<10; ++nextDigit) {
            const Node* node = m_descendants[nextDigit];
            if (node) {
                const int adjustedCount = node->Count() - countAdj;
                if (adjustedCount > 0) {
                    char c = '0' + nextDigit;
                    string strWithC = str;
                    strWithC += c;
                    cout << strWithC << ": " << adjustedCount << "\n";
                    node->Print(strWithC, countAdj);
                }
            }
        }
    }
    int SumRepeated() const
    {
        int sumRepeated = 0;
        for(int nextDigit=0; nextDigit<10; ++nextDigit) {
            const Node* node = m_descendants[nextDigit];
            if (node) {
                const int repeatCount = node->Count() -1;
                if (repeatCount > 0) {
                    sumRepeated += repeatCount;
                    sumRepeated += node->SumRepeated();
                }
            }
        }
        return sumRepeated;
    }
    int Count() const { return m_count; }
};

int main(int argc, const char* argv[])
{
    // Construct
    Node* const root = new Node;
    for(const char* str = "12412415635"; *str; ++str) {
        root->Add(str);
    }

    // Print
    root->Print(string(), 1);

    cout << "Sum repeated is " << root->SumRepeated() << "\n";

    return 0; // (nodes are leaked)
}

Output

1: 2
12: 1
124: 1
1241: 1
2: 1
24: 1
241: 1
4: 1
41: 1
5: 1
Sum repeated is 11

Note there is one additional repeated substring here, i.e. 1241.

As I said, this is proof of concept, so, for example, you'd want to use a dictionary instead of an array to store descendants, with larger m. Also, this implementation is not optimal even in terms of the overall algorithm: it's O(n^2) in time and space. You can optimize by collapsing paths with no branching to get O(n) space, and use clever construction algorithms to get O(n) time. Also, as pointed out in one of the other answers, you don't need to process sub-strings that are up to half the length of the original.

Comments

0

Here's something in JavaScript: (output is more or less immediate and JavaScript is one of the slowest, I think):

<script>

function f (s, m) {
  var i = 0, n = s.length, c = 0, H = {}, Hi = {}
  while (i < n-1) {
    for (var j=i+1; j<=Math.min (i + 1 + m,n - 1); j++) {
      if (s[i] == s[j]) {
        var i_= i, j_= j, tmp = ''
        while (s[i_] == s[j_] && i_ < j) {
          tmp += String (s[i_])
          i_++
          j_++
        }
        var k = tmp.length - 1
        while (k >= 0) {
          if (!H[tmp]) {
            H[tmp] = 1
            Hi[tmp] = i
            c++
          }
          else if (i == Hi[tmp]) {
            H[tmp]++
            c++
          }
          tmp = tmp.substr(0,k--)
        }
      }
    }
    i++
  }
  var t1 = ''
  for (var i in H) t1 += i + ", "
  return [t1,c]
}

var a1 = [1,2,4,1,2,4,1,5,6,3,5],
    a2 = []

for (var i=0;i<100000;i++) a2.push(Math.floor(Math.random()*10))

console.log(a1 +"\n"+ f(a1,10))
console.log(f(a2,10))

</script>

Output:

1,2,4,1,2,4,1,5,6,3,5
1, 2, 4, 5, 12, 24, 41, 124, 241, ,10

["0...6358, 3582, 038, 4304, 2068, 095, 
 9252, 37866, 3786, 7866, 7893, 8701, 0669, ", 771]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.