Find duplicate in unsorted array with best time Complexity

Question

I know there were similar questions, but not of such specificity

Input: n-elements array with unsorted emelents with values from 1 to (n-1). one of the values is duplicate (eg. n=5, tab[n] = {3,4,2,4,1}.

Task: find duplicate with best Complexity.

I wrote alghoritm:

int tab[] = { 1,6,7,8,9,4,2,2,3,5 };
int arrSize = sizeof(tab)/sizeof(tab[0]);

for (int i = 0; i < arrSize; i++) {
    tab[tab[i] % arrSize] = tab[tab[i] % arrSize] + arrSize;
}

for (int i = 0; i < arrSize; i++) {
    if (tab[i] >= arrSize * 2) {
        std::cout << i;
        break;
    }

but i dont think it is with best possible Complexity. Do You know better method/alghoritm? I can use any c++ library, but i don't have any idea.

Is it possible to get better complexity than O(n) ?

what makes you think it can be done in less than O(N) ? Even if you pass the array once and stop once you found the duplicate its average O(N/2) which is the same as O(N) — 463035818_is_not_an_ai
– 463035818_is_not_an_ai, Commented Aug 8, 2022 at 19:53
Why do you think it's not "best cocmplexity" (also, best in terms of what? memory? time? big O notation?) — amit
– amit, Commented Aug 8, 2022 at 19:54
@463035818_is_not_a_number I don't know if it can be done faster than O (n), but I want to find the fastes alghoritm. In this case the same complexity is if we have more than one duplicate, for that I thought it could be done better — hero
– hero, Commented Aug 8, 2022 at 20:00
Don't get too hang up by big O notation. It is about asymptotic complexity. "fastest algorithm" is not necessarily "algorithm with best asymptotic complexity". Though, its not possible to do something that requires you to inspect N elements in less than O(N) — 463035818_is_not_an_ai
– 463035818_is_not_an_ai, Commented Aug 8, 2022 at 20:02

amit · Accepted Answer · 2022-08-08 20:08:06Z

6

In terms of big-O notation, you cannot beat O(n) (same as your solution here). But you can have better constants and simpler algorithm, by using the property that the sum of elements 1,...,n-1 is well known.

int sum = 0;
for (int x : tab) {
  sum += x;
}

duplicate = sum - ((n*(n-1)/2))

The constants here will be significntly better - as each array index is accessed exactly once, which is much more cache friendly and efficient to modern architectures.

(Note, this solution does ignore integer overflow, but it's easy to account for it by using 2x more bits in sum than there are in the array's elements).

edited Aug 8, 2022 at 20:08

answered Aug 8, 2022 at 19:59

amit

179k27 gold badges245 silver badges348 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

amit Over a year ago

@Vishrant There are none. All elements appear once, except one that appears twice. (There are n elements, range is n-1, and only one appear twice or more).

maraca · Accepted Answer · 2022-08-08 20:31:57Z

6

Adding the classic answer because it was requested. It is based on the idea that if you xor a number with itself you get 0. So if you xor all numbers from 1 to n - 1 and all numbers in the array you will end up with the duplicate.

int duplicate = arr[0];
for (int i = 1; i < arr.length; i++) {
    duplicate = duplicate ^ arr[i] ^ i;
}

answered Aug 8, 2022 at 20:31

maraca

9,5973 gold badges27 silver badges50 bronze badges

5 Comments

user4581301 Over a year ago

Note for those who come later: This doesn't detect duplicates. It only correctly identifies the one-and-only-one duplicate. It cannot give the location of the duplicate.

maraca Over a year ago

Yes if the input is "wrong" the answer will be wrong too. Although you can scan the array for the duplicate afterwards if you want the position and it is still O(n). In my opinion the cleanest solution is just putting the numbers in a hash set with some other checks for validity of the input.

maraca Over a year ago

Btw similar problem is pretty crazy: numbers in [1, n] with array of size n + 1, find one (of the possibly multiple) duplicate in O(1) space and O(n) time complexity without modifying tht input: stackoverflow.com/questions/48841439/…

Dave Over a year ago

You can save half the work here since the xor of the numbers from 1 to k is either k, 1, k+1, or 0, depending on whether k mod 4 is 0, 1, 2, or 3.

maraca Over a year ago

@Dave thanks for the info, I was actually wondering if we can save some xor operations. The only drawback is that the code gets slightly more complicated.

Andreas Wenzel · Accepted Answer · 2022-08-08 20:53:03Z

2

Don't focus too much on asymptotic complexity. In practice the fastest algorithm is not necessarily the one with lowest asymtotic complexity. That is because constants are not taken into account: O( huge_constant * N) == O(N) == O( tiny_constant * N).

You cannot inspect N values in less than O(N). Though you do not need a full pass through the array. You can stop once you found the duplicate:

#include <iostream>
#include <vector>

int main() {
    std::vector<int> vals{1,2,4,6,5,3,2};
    std::vector<bool> present(vals.size());
    for (const auto& e : vals) {
        if (present[e]) {
            std::cout << "duplicate is " << e << "\n";
            break;
        }
        present[e] = true;
    }
}

In the "lucky case" the duplicate is at index 2. In the worst case the whole vector has to be scanned. On average it is again O(N) time complexity. Further it uses O(N) additional memory while yours is using no additional memory. Again: Complexity alone cannot tell you which algorithm is faster (especially not for a fixed input size).

No matter how hard you try, you won't beat O(N), because no matter in what order you traverse the elements (and remember already found elements), the best and worst case are always the same: Either the duplicate is in the first two elements you inspect or it's the last, and on average it will be O(N).

edited Aug 8, 2022 at 20:53

Andreas Wenzel

27.1k4 gold badges33 silver badges52 bronze badges

answered Aug 8, 2022 at 20:12

463035818_is_not_an_ai

129k11 gold badges117 silver badges241 bronze badges

1 Comment

user4581301 Over a year ago

Standard caveat about std::vector<bool>: It's optimized for space and can get a bit weird. Weird enough that it gets its own documentation page. The odder aspects shouldn't hurt here, but the time spent on the bit packing needs to be considered.

Collectives™ on Stack Overflow

Find duplicate in unsorted array with best time Complexity

3 Answers 3

1 Comment

5 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related