Using an array and moving duplicates to end

Question

I got this question at an interview and at the end was told there was a more efficient way to do this but have still not been able to figure it out. You are passing into a function an array of integers and an integer for size of array. In the array you have a lot of numbers, some that repeat for example 1,7,4,8,2,6,8,3,7,9,10. You want to take that array and return an array where all the repeated numbers are put at the end of the array so the above array would turn into 1,7,4,8,2,6,3,9,10,8,7. The numbers I used are not important and I could not use a buffer array. I was going to use a BST, but the order of the numbers must be maintained(except for the duplicate numbers). I could not figure out how to use a hash table so I ended up using a double for loop(n^2 horrible I know). How would I do this more efficiently using c++. Not looking for code, just an idea of how to do it better.

possible duplicate of How to remove duplicates from an array — Hans Passant
– Hans Passant, Commented Oct 18, 2011 at 20:05
no bounds on numbers in array can also, be negative. Order does matter so I cannot sort and could not figure how to use a hash table. If you can figure out a hash table please let me know. — Aaron
– Aaron, Commented Oct 18, 2011 at 20:08
@HansPassant Doesn't look like a duplicate. He needs to maintain the relative order of elements. — Branko Dimitrijevic
– Branko Dimitrijevic, Commented Oct 18, 2011 at 20:11
Is recursion allowed ? (this make it possible to "store" an unknown quantity of data) — BatchyX
– BatchyX, Commented Oct 18, 2011 at 20:37

NPE · Accepted Answer · 2011-10-18 20:21:13Z

8

In what follows:

arr is the input array;
seen is a hash set of numbers already encountered;
l is the index where the next unique element will be placed;
r is the index of the next element to be considered.

Since you're not looking for code, here is a pseudo-code solution (which happens to be valid Python):

arr = [1,7,4,8,2,6,8,3,7,9,10]
seen = set()
l = 0
r = 0
while True:
  # advance `r` to the next not-yet-seen number
  while r < len(arr) and arr[r] in seen:
    r += 1
  if r == len(arr): break
  # add the number to the set
  seen.add(arr[r])
  # swap arr[l] with arr[r]
  arr[l], arr[r] = arr[r], arr[l]
  # advance `l`
  l += 1
print arr

On your test case, this produces

[1, 7, 4, 8, 2, 6, 3, 9, 10, 8, 7]

answered Oct 18, 2011 at 20:21

NPE

503k114 gold badges970 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Aaron Over a year ago

I get most of it. What is set() in c++?

Michał Bentkowski Over a year ago

+1 for "pseudo-code solution (which happens to be valid Python)".

tune2fs · Accepted Answer · 2011-10-18 20:11:25Z

2

I would use an additional map, where the key is the integer value from the array and the value is an integer set to 0 in the beginning. Now I would go through the array and increase the values in the map if the key is already in the map. In the end I would go again through the array. When the integer from the array has a value of one in the map, I would not change anything. When it has a value of 2 or more in the map I would swap the integer from the array with the last one.

This should result in a runtime of O(n*log(n))

answered Oct 18, 2011 at 20:11

tune2fs

7,7237 gold badges44 silver badges57 bronze badges

1 Comment

Mooing Duck Over a year ago

and if the array is [-2e9, 2e9]?

SomeoneRandom · Accepted Answer · 2011-10-18 20:17:47Z

2

The way I would do this would be to create an array twice the size of the original and create a set of integers.

Then Loop through the original array, add each element to the set, if it already exists add it to the 2nd half of the new array, else add it to the first half of the new array.

In the end you would get an array that looks like: (using your example)

1,7,4,8,2,6,3,9,10,-,-,8,7,-,-,-,-,-,-,-,-,-

Then I would loop through the original array again and make each spot equal to the next non-null position (or 0'd or whatever you decided)

That would make the original array turn into your solution...

This ends up being O(n) which is about as efficient as I can think of

Edit: since you can not use another array, when you find a value that is already in the
set you can move every value after it forward one and set the last value equal to the
number you just checked, this would in effect do the same thing but with a lot more operations.

edited Oct 18, 2011 at 20:17

answered Oct 18, 2011 at 20:10

SomeoneRandom

2641 gold badge7 silver badges19 bronze badges

4 Comments

SomeoneRandom Over a year ago

You can use a hash table but not an array? odd

SomeoneRandom Over a year ago

@Mooing Duck, in C++ the Set class returns true when you add an element that already exists since sets cannot have duplicates

Mooing Duck Over a year ago

I misread. A set works fine. Note that you have blanks in your array, so you'll have to use pointers to ints.

SomeoneRandom Over a year ago

That is true, I imagined the blanks could be filled in with something of his choosing, but since he can't use an array anyways the he would have to just swap positions instead of using another array.

dreamlax · Accepted Answer · 2011-10-18 20:23:48Z

I have been out of touch for a while, but I'd probably start out with something like this and see how it scales with larger input. I know you didn't ask for code but in some cases it's easier to understand than an explanation.

Edit: Sorry I missed the requirement that you cannot use a buffer array.

// returns new vector with dupes a the end
std::vector<int> move_dupes_to_end(std::vector<int> input)
{
    std::set<int> counter;
    std::vector<int> result;
    std::vector<int> repeats;

    for (std::vector<int>::iterator i = input.begin(); i < input.end(); i++)
    {
        if (counter.find(*i) == counter.end())
            result.push_back(*i);
        else
            repeats.push_back(*i);
        counter.insert(*i);
    }

    result.insert(result.end(), repeats.begin(), repeats.end());

    return result;
}

Naszta · Accepted Answer · 2011-10-18 20:32:49Z

2

#include <algorithm>

T * array = [your array];
size_t size = [array size];
                                           // Complexity:
sort( array, array + size );               // n * log(n) and could be threaded
                                           // (if merge sort)
T * last = unique( array, array + size );  // n, but the elements after the last
                                           // unique element are not defined

Check sort and unique.

edited Oct 18, 2011 at 20:32

answered Oct 18, 2011 at 20:17

Naszta

7,7922 gold badges37 silver badges51 bronze badges

3 Comments

Chad Over a year ago

Very nice, but doesn't preserve ordering.

Naszta Over a year ago

@Chad: if sort is not there, the result is undefined. MSDN (and standard) says: Removes duplicate elements that are adjacent to each other in a specified range.

Chad Over a year ago

I know, I liked your solution, but it doesn't meet the OP's requirement to preserve ordering.

Mooing Duck · Accepted Answer · 2011-10-18 20:50:28Z

void remove_dup(int* data, int count) {
    int* L=data; //place to put next unique number
    int* R=data+count; //place to place next repeat number
    std::unordered_set<int> found(count); //keep track of what's been seen
    for(int* cur=data; cur<R; ++cur) { //until we reach repeats
        if(found.insert(*cur).second == false) { //if we've seen it
            std::swap(*cur,*--R); //put at the beginning of the repeats
        } else                    //or else
            std::swap(*cur,*L++); //put it next in the unique list
    }
    std::reverse(R, data+count); //reverse the repeats to be in origional order
}

http://ideone.com/3choA
Not that I would turn in code this poorly commented. Also note that unordered_set probably uses it's own array internally, bigger than data. (This has been rewritten based on aix's answer, to be much faster)

Mr. Llama · Accepted Answer · 2011-10-18 21:01:17Z

2

If you know the bounds on what the integer values are, B, and the size of the integer array, SZ, then you can do something like the following:

Create an array of booleans seen_before with B elements, initialized to 0.
Create a result array result of integers with SZ elements.
Create two integers, one for front_pos = 0, one for back_pos = SZ - 1.
Iterate across the original list:
- Set an integer variable val to the value of the current element
- If seen_before[val] is set to 1, put the number at result[back_pos] then decrement back_pos
- If seen_before[val] is not set to 1, put the number at result[front_pos] then increment front_pos and set seen_before[val] to 1.

Once you finish iterating across the main list, all the unique numbers will be at the front of the list while the duplicate numbers will be at the back. Fun part is that the entire process is done in one pass. Note that this only works if you know the bounds of the values appearing in the original array.

Edit: It was pointed out that there's no bounds on the integers used, so instead of initializing seen_before as an array with B elements, initialize it as a map<int, bool>, then continue as usual. That should get you n*log(n) performance.

edited Oct 18, 2011 at 21:01

answered Oct 18, 2011 at 20:14

Mr. Llama

21k3 gold badges70 silver badges121 bronze badges

1 Comment

Mooing Duck Over a year ago

He pretty clearly said there's no (<2bil) bounds on the integers.

Pankaj Khattar · Accepted Answer · 2013-07-13 18:51:06Z

This can be done by iterating the array & marking index of the first change. later on swaping that mark index value with next unique value & then incrementing that mark index for next swap

Java Implementation:

public static void solve() {
                Integer[] arr = new Integer[] { 1, 7, 4, 8, 2, 6, 8, 3, 7, 9, 10 };
        final HashSet<Integer> seen = new HashSet<Integer>();
        int l = -1;

        for (int i = 0; i < arr.length; i++) {
            if (seen.contains(arr[i])) {
                if (l == -1) {
                    l = i;
                }
                continue;
            }
            if (l > -1) {
                final int temp = arr[i];
                arr[i] = arr[l];
                arr[l] = temp;
                l++;
            }
            seen.add(arr[i]);
        }

    }

output is 1 7 4 8 2 6 3 9 10 8 7

Chad · Accepted Answer · 2011-10-18 20:24:46Z

0

It's ugly, but it meets the requirements of moving the duplicates to the end in place (no buffer array)

// warning, some light C++11
void dup2end(int* arr, size_t cnt)
{
   std::set<int> k;
   auto end = arr + cnt-1;
   auto max = arr + cnt;
   auto curr = arr;

   while(curr < max)
   {
      auto res = k.insert(*curr);

      // first time encountered
      if(res.second)
      {
         ++curr;
      }
      else
      {
         // duplicate:
         std::swap(*curr, *end);
         --end;
         --max;
      }
   }
}

answered Oct 18, 2011 at 20:24

Chad

19.2k4 gold badges51 silver badges65 bronze badges

Comments

sunjerry · Accepted Answer · 2014-04-15 15:27:15Z

void move_duplicates_to_end(vector<int> &A) {
    if(A.empty()) return;
    int i = 0, tail = A.size()-1;
    while(i <= tail) {
        bool is_first = true;    // check of current number is first-shown
        for(int k=0; k<i; k++) { // always compare with numbers before A[i]
            if(A[k] == A[i]) {
                is_first = false;
                break;
            }
        }
        if(is_first == true) i++;
        else {
            int tmp = A[i]; // swap with tail
            A[i] = A[tail];
            A[tail] = tmp;
            tail--;
        }
    }

If the input array is {1,7,4,8,2,6,8,3,7,9,10}, then the output is {1,7,4,8,2,6,10,3,9,7,8}. Comparing with your answer {1,7,4,8,2,6,3,9,10,8,7}, the first half is the same, while the right half is different, because I swap all duplicates with the tail of the array. As you mentioned, the order of the duplicates can be arbitrary.

Collectives™ on Stack Overflow

Using an array and moving duplicates to end

10 Answers 10

2 Comments

1 Comment

4 Comments

Comments

3 Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

2 Comments

1 Comment

4 Comments

Comments

3 Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related