0

How can I find duplicates in array when there is more than one duplicated element?

When the array is only one duplicated element (for example: 1, 2, 3, 4, 4, 4, 5, 6, 7) then it is very easy:

int duplicate(int* a, int s)
{ 
    int x = a[0];
    for(int i = 1; i < s; ++i)
    {
        x = x ^ a[i];
    }
    for(int i = 0; i < a[s]; ++i)
    {
        x = x ^ i;
    }
    return x;
}

But if the input array contains more than one duplicated element (for example: 1, 2, 2, 2, 3, 4, 4, 4, 5, 6, 7), the above won't work. How can we solve this problem in O(n) time?

5
  • 1
    yes, but interesting case and when the array is not sorted. Commented Dec 9, 2013 at 8:52
  • 3
    What on earth are you dong here? What are all these XOR operations for? And why the weird syntax (*(a+s-1) instead of a[s-1])? Commented Dec 9, 2013 at 8:52
  • 4
    If you are allowed to use extra space, you can build a hash-map of value to count and then check which counts are > 1. Commented Dec 9, 2013 at 8:53
  • 1
    What does function duplicate() is supposed to return? currently, you're returning this value ({XOR:a[i]} XOR 1..a[s-1]) which doesn't make sense to me Commented Dec 9, 2013 at 9:13
  • Khaled A Khunaifer --- certainly an element that is a duplicate, and it is written in the code. Commented Dec 9, 2013 at 9:30

2 Answers 2

1

If space is no concern or the maximal number is quite low, you can simple use a kind of a bit-array and mark all already occurred numbers by setting the bit at the position of the number.

It'a a kind of HashSet with trivial (identity) hash-function. Tests and set cost O(1) time.

Sign up to request clarification or add additional context in comments.

Comments

1

Using a set is one of the possible generic solutions. Example in c++:

template <typename T>
void filter_duplicates(T* arr, int length) {
    std::unordered_set<T> set;
    for (int i = 0; i < length; ++i) {
        if (set.count(arr[i]) > 0) {
            // then it's a duplicate
        }
        set.insert(arr[i]);
    }
    // the set contains all the items, unduplicated
}

As unordered_set is implemented as a hash table, insertion and lookup are of amortized constant complexity. As a set can only contain unique keys, this effectively de-duplicates the items. We could finally convert back the set to an array. We could also use a map to count the occurrences.

If array elements are integers and that the maximum possible value is known, and fairly low, then the set can be replaced by a simple array either 1. of boolean or 2. of integer if we want to count the number of occurrences.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.