4

I know there were similar questions, but not of such specificity

Input: n-elements array with unsorted emelents with values from 1 to (n-1). one of the values is duplicate (eg. n=5, tab[n] = {3,4,2,4,1}.

Task: find duplicate with best Complexity.

I wrote alghoritm:

int tab[] = { 1,6,7,8,9,4,2,2,3,5 };
int arrSize = sizeof(tab)/sizeof(tab[0]);

for (int i = 0; i < arrSize; i++) {
    tab[tab[i] % arrSize] = tab[tab[i] % arrSize] + arrSize;
}

for (int i = 0; i < arrSize; i++) {
    if (tab[i] >= arrSize * 2) {
        std::cout << i;
        break;
    }

but i dont think it is with best possible Complexity. Do You know better method/alghoritm? I can use any c++ library, but i don't have any idea.

Is it possible to get better complexity than O(n) ?

14
  • what makes you think it can be done in less than O(N) ? Even if you pass the array once and stop once you found the duplicate its average O(N/2) which is the same as O(N) Commented Aug 8, 2022 at 19:53
  • Why do you think it's not "best cocmplexity" (also, best in terms of what? memory? time? big O notation?) Commented Aug 8, 2022 at 19:54
  • @amit Time Complexity. Commented Aug 8, 2022 at 19:57
  • @463035818_is_not_a_number I don't know if it can be done faster than O (n), but I want to find the fastes alghoritm. In this case the same complexity is if we have more than one duplicate, for that I thought it could be done better Commented Aug 8, 2022 at 20:00
  • 4
    Don't get too hang up by big O notation. It is about asymptotic complexity. "fastest algorithm" is not necessarily "algorithm with best asymptotic complexity". Though, its not possible to do something that requires you to inspect N elements in less than O(N) Commented Aug 8, 2022 at 20:02

3 Answers 3

6

In terms of big-O notation, you cannot beat O(n) (same as your solution here). But you can have better constants and simpler algorithm, by using the property that the sum of elements 1,...,n-1 is well known.

int sum = 0;
for (int x : tab) {
  sum += x;
}

duplicate = sum - ((n*(n-1)/2))

The constants here will be significntly better - as each array index is accessed exactly once, which is much more cache friendly and efficient to modern architectures.

(Note, this solution does ignore integer overflow, but it's easy to account for it by using 2x more bits in sum than there are in the array's elements).

Sign up to request clarification or add additional context in comments.

1 Comment

@Vishrant There are none. All elements appear once, except one that appears twice. (There are n elements, range is n-1, and only one appear twice or more).
6

Adding the classic answer because it was requested. It is based on the idea that if you xor a number with itself you get 0. So if you xor all numbers from 1 to n - 1 and all numbers in the array you will end up with the duplicate.

int duplicate = arr[0];
for (int i = 1; i < arr.length; i++) {
    duplicate = duplicate ^ arr[i] ^ i;
}

5 Comments

Note for those who come later: This doesn't detect duplicates. It only correctly identifies the one-and-only-one duplicate. It cannot give the location of the duplicate.
Yes if the input is "wrong" the answer will be wrong too. Although you can scan the array for the duplicate afterwards if you want the position and it is still O(n). In my opinion the cleanest solution is just putting the numbers in a hash set with some other checks for validity of the input.
Btw similar problem is pretty crazy: numbers in [1, n] with array of size n + 1, find one (of the possibly multiple) duplicate in O(1) space and O(n) time complexity without modifying tht input: stackoverflow.com/questions/48841439/…
You can save half the work here since the xor of the numbers from 1 to k is either k, 1, k+1, or 0, depending on whether k mod 4 is 0, 1, 2, or 3.
@Dave thanks for the info, I was actually wondering if we can save some xor operations. The only drawback is that the code gets slightly more complicated.
2

Don't focus too much on asymptotic complexity. In practice the fastest algorithm is not necessarily the one with lowest asymtotic complexity. That is because constants are not taken into account: O( huge_constant * N) == O(N) == O( tiny_constant * N).

You cannot inspect N values in less than O(N). Though you do not need a full pass through the array. You can stop once you found the duplicate:

#include <iostream>
#include <vector>

int main() {
    std::vector<int> vals{1,2,4,6,5,3,2};
    std::vector<bool> present(vals.size());
    for (const auto& e : vals) {
        if (present[e]) {
            std::cout << "duplicate is " << e << "\n";
            break;
        }
        present[e] = true;
    }
}

In the "lucky case" the duplicate is at index 2. In the worst case the whole vector has to be scanned. On average it is again O(N) time complexity. Further it uses O(N) additional memory while yours is using no additional memory. Again: Complexity alone cannot tell you which algorithm is faster (especially not for a fixed input size).

No matter how hard you try, you won't beat O(N), because no matter in what order you traverse the elements (and remember already found elements), the best and worst case are always the same: Either the duplicate is in the first two elements you inspect or it's the last, and on average it will be O(N).

1 Comment

Standard caveat about std::vector<bool>: It's optimized for space and can get a bit weird. Weird enough that it gets its own documentation page. The odder aspects shouldn't hurt here, but the time spent on the bit packing needs to be considered.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.