Given an array of integers, find the first integer that is unique

Question

Given an array of integers, find the first integer that is unique.

my solution: use std::map

put integer (number as key, its index as value) to it one by one (O(n^2 lgn)), if have duplicate, remove the entry from the map (O(lg n)), after putting all numbers into the map, iterate the map and find the key with smallest index O(n).

O(n^2 lgn) because map needs to do sorting.

It is not efficient.

other better solutions?

When you say "first integer that is unique" do you mean the first one encountered or the smallest one or any at random? — Mark Ransom
– Mark Ransom, Commented Oct 26, 2011 at 18:54
Adding all the integers to a map is O(n log(n)), not O(n^2 log(n)). Each insertion is O(log(n)), and there are n of them. — Mike Seymour
– Mike Seymour, Commented Oct 26, 2011 at 19:04
does the new solution matchup to your question or did i miss something? — Srikar Appalaraju
– Srikar Appalaraju, Commented Feb 7, 2013 at 1:44

Michael Goldshteyn · Accepted Answer · 2012-01-17 18:55:18Z

11

I believe that the following would be the optimal solution, at least based on time / space complexity:

Step 1: Store the integers in a hash map, which holds the integer as a key and the count of the number of times it appears as the value. This is generally an O(n) operation and the insertion / updating of elements in the hash table should be constant time, on the average. If an integer is found to appear more than twice, you really don't have to increment the usage count further (if you don't want to).

Step 2: Perform a second pass over the integers. Look each up in the hash map and the first one with an appearance count of one is the one you were looking for (i.e., the first single appearing integer). This is also O(n), making the entire process O(n).

Some possible optimizations for special cases:

Optimization A: It may be possible to use a simple array instead of a hash table. This guarantees O(1) even in the worst case for counting the number of occurrences of a particular integer as well as the lookup of its appearance count. Also, this enhances real time performance, since the hash algorithm does not need to be executed. There may be a hit due to potentially poorer locality of reference (i.e., a larger sparse table vs. the hash table implementation with a reasonable load factor). However, this would be for very special cases of integer orderings and may be mitigated by the hash table's hash function producing pseudorandom bucket placements based on the incoming integers (i.e., poor locality of reference to begin with).

Each byte in the array would represent the count (up to 255) for the integer represented by the index of that byte. This would only be possible if the difference between the lowest integer and the highest (i.e., the cardinality of the domain of valid integers) was small enough such that this array would fit into memory. The index in the array of a particular integer would be its value minus the smallest integer present in the data set.

For example on modern hardware with a 64-bit OS, it is quite conceivable that a 4GB array can be allocated which can handle the entire domain of 32-bit integers. Even larger arrays are conceivable with sufficient memory.

The smallest and largest integers would have to be known before processing, or another linear pass through the data using the minmax algorithm to find out this information would be required.

Optimization B: You could optimize Optimization A further, by using at most 2 bits per integer (One bit indicates presence and the other indicates multiplicity). This would allow for the representation of four integers per byte, extending the array implementation to handle a larger domain of integers for a given amount of available memory. More bit games could be played here to compress the representation further, but they would only support special cases of data coming in and therefore cannot be recommended for the still mostly general case.

edited Jan 17, 2012 at 18:55

answered Oct 26, 2011 at 18:46

Michael Goldshteyn

74.9k25 gold badges136 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

Karoly Horvath Over a year ago

hash is average O(1), but only if you set it up properly.

Michael Goldshteyn Over a year ago

@yi_H, since you are given an array of integers per the problem definition, meaning you know the size of the problem set, you can presize the hash table properly to have a good final load factor.

user1002288 Over a year ago

@Michael, how to make sure that the number you find is the "first" single int in the given array ? Your solution ignores index. thanks

Michael Goldshteyn Over a year ago

@user1002288, you're missing Step 2 which identifies the "first" such number!

Ben Voigt Over a year ago

@user1002288: So when you insert, you store a tuple (index of elemented inserted, count := 0), and then later you update the count portion of the tuple. And then iterate the hash. This doesn't change the algorithmic complexity. Or, as Michael suggested, you can iterate the whole array again.

|

Srikar Appalaraju · Accepted Answer · 2013-02-07 01:43:42Z

1

All this for no reason. Just using 2 for-loops & a variable would give you a simple O(n^2) algo.

If you are taking all the trouble of using a hash map, then it might as well be what @Micheal Goldshteyn suggests

UPDATE: I know this question is 1 year old. But was looking through the questions I answered and came across this. Thought there is a better solution than using a hashtable.

When we say unique, we will have a pattern. Eg: [5, 5, 66, 66, 7, 1, 1, 77]. In this lets have moving window of 3. first consider (5,5,66). we can easily estab. that there is duplicate here. So move the window by 1 element so we get (5,66,66). Same here. move to next (66,66,7). Again dups here. next (66,7,1). No dups here! take the middle element as this has to be the first unique in the set. The left element belongs to the dup so could 1. Hence 7 is the first unique element.

space: O(1) time: O(n) * O(m^2) = O(n) * 9 ≈ O(n)

edited Feb 7, 2013 at 1:43

answered Oct 26, 2011 at 18:50

Srikar Appalaraju

74.1k55 gold badges221 silver badges265 bronze badges

7 Comments

Martin Kristiansen Over a year ago

I like this one, int simple and will most likely be extremely fast.

Michael Goldshteyn Over a year ago

O(N^2) is extremely fast compared to O(N)?

Dave Over a year ago

No, but the benefits of increased locality, and low coefficients can outweigh the cost of complexity. Especially if the inner loop is replaced with memmem(), this simple solution has the potential to be fast.

Dennis Zickefoose Over a year ago

It is also in place, specifically not requiring any heap allocations. Unless I knew I were going to be using very large data sets, I would be inclined to start here.

Srikar Appalaraju Over a year ago

Yes that was my first thought. Start with easy to implement solutions with descent performance & then once the entire program is done then optimise it like crazy...

|

Daniel · Accepted Answer · 2011-10-26 18:47:40Z

0

Inserting to a map is O(log n) not O(n log n) so inserting n keys will be n log n. also its better to use set.

answered Oct 26, 2011 at 18:47

Daniel

31.8k19 gold badges87 silver badges145 bronze badges

4 Comments

Michael Goldshteyn Over a year ago

Since when is O(N log N) better than O(N)?

Daniel Over a year ago

@MichaelGoldshteyn: its better than O(n^2 log n) the OP thought it to be, not O(n)

user1002288 Over a year ago

@Dani, each time you insert map, the keys will be sorted O(n lg n), so if you insert m times, it is O(m n lgn), n is the current size the map, m is the total numbers that need to be inserted.

Daniel Over a year ago

@user1002288: no they will not be sorted. the map is actually a self balancing RB tree that does binary search for insertion, it doesn't just throws it in and sorts.

Dave · Accepted Answer · 2011-10-27 00:07:21Z

0

Although it's O(n^2), the following has small coefficients, isn't too bad on the cache, and uses memmem() which is fast.

 for(int x=0;x<len-1;x++)
     if(memmem(&array[x+1], sizeof(int)*(len-(x+1)), array[x], sizeof(int))==NULL &&
        memmem(&array[x+1], sizeof(int)*(x-1), array[x], sizeof(int))==NULL)
            return array[x];

edited Oct 27, 2011 at 0:07

answered Oct 26, 2011 at 18:53

Dave

11.2k3 gold badges34 silver badges55 bronze badges

5 Comments

Dennis Zickefoose Over a year ago

I believe memmem is part of the POSIX standard, but neither C nor C++ have formally adopted it.

Dennis Zickefoose Over a year ago

This also has a slight bug. Consider the input [0 0 2]... Your first step checks [0 2] for a 0, which it finds. The second step will check [2] for 0, but not find it, making array[1] look like the answer

Dave Over a year ago

What second step? When memmem sees the 0 in [0 2] it returns non-null, and I return array[0]

Dennis Zickefoose Over a year ago

Oh, so it does. Then its a major bug... you want the first unique element, and array[0] is clearly non-unique.

Dave Over a year ago

Aah! I see the bug. *fix fix*

user3612419 · Accepted Answer · 2014-05-07 13:37:53Z

0

public static string firstUnique(int[] input)  
{
    int size = input.Length;
    bool[] dupIndex = new bool[size];
    for (int i = 0; i < size; ++i) 
    {
        if (dupIndex[i]) 
        {
            continue;
        } 
        else if (i == size - 1) 
        {
            return input[i].ToString();
        }
        for (int j = i + 1; j < size; ++j) 
        {
            if (input[i]==input[j]) 
            {
                dupIndex[j] = true;
                break;
            } 
            else if (j == size - 1)
            {
                return input[i].ToString();
            }
        }
    }
    return "No unique element";
}

answered May 7, 2014 at 13:37

user3612419

1

Comments

Ashutosh · Accepted Answer · 2014-08-16 15:17:09Z

@user3612419

  Solution given you is good with some what close to O(N*N2) but further optimization in same code is possible I just added two-3 lines that you missed.



public static string firstUnique(int[] input)  
{
  int size = input.Length;
  bool[] dupIndex = new bool[size];
  for (int i = 0; i < size; ++i) 
  {
     if (dupIndex[i]) 
    {
        continue;
    } 
    else if (i == size - 1) 
    {
        return input[i].ToString();
    }
    for (int j = i + 1; j < size; ++j) 
    {
  if(dupIndex[j]==true)
  {
 continue;
 }
        if (input[i]==input[j]) 
        {
            dupIndex[j] = true;
    dupIndex[i] = true;
            break;
        } 
        else if (j == size - 1)
        {
            return input[i].ToString();
         }
     } 
  }
return "No unique element";
 }

Collectives™ on Stack Overflow

Given an array of integers, find the first integer that is unique

6 Answers 6

13 Comments

7 Comments

4 Comments

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

13 Comments

7 Comments

4 Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related