3

Given an array of integers, find the first integer that is unique.

my solution: use std::map

put integer (number as key, its index as value) to it one by one (O(n^2 lgn)), if have duplicate, remove the entry from the map (O(lg n)), after putting all numbers into the map, iterate the map and find the key with smallest index O(n).

O(n^2 lgn) because map needs to do sorting.

It is not efficient.

other better solutions?

3
  • 1
    When you say "first integer that is unique" do you mean the first one encountered or the smallest one or any at random? Commented Oct 26, 2011 at 18:54
  • 3
    Adding all the integers to a map is O(n log(n)), not O(n^2 log(n)). Each insertion is O(log(n)), and there are n of them. Commented Oct 26, 2011 at 19:04
  • does the new solution matchup to your question or did i miss something? Commented Feb 7, 2013 at 1:44

6 Answers 6

11

I believe that the following would be the optimal solution, at least based on time / space complexity:

Step 1: Store the integers in a hash map, which holds the integer as a key and the count of the number of times it appears as the value. This is generally an O(n) operation and the insertion / updating of elements in the hash table should be constant time, on the average. If an integer is found to appear more than twice, you really don't have to increment the usage count further (if you don't want to).

Step 2: Perform a second pass over the integers. Look each up in the hash map and the first one with an appearance count of one is the one you were looking for (i.e., the first single appearing integer). This is also O(n), making the entire process O(n).

Some possible optimizations for special cases:

Optimization A: It may be possible to use a simple array instead of a hash table. This guarantees O(1) even in the worst case for counting the number of occurrences of a particular integer as well as the lookup of its appearance count. Also, this enhances real time performance, since the hash algorithm does not need to be executed. There may be a hit due to potentially poorer locality of reference (i.e., a larger sparse table vs. the hash table implementation with a reasonable load factor). However, this would be for very special cases of integer orderings and may be mitigated by the hash table's hash function producing pseudorandom bucket placements based on the incoming integers (i.e., poor locality of reference to begin with).

Each byte in the array would represent the count (up to 255) for the integer represented by the index of that byte. This would only be possible if the difference between the lowest integer and the highest (i.e., the cardinality of the domain of valid integers) was small enough such that this array would fit into memory. The index in the array of a particular integer would be its value minus the smallest integer present in the data set.

For example on modern hardware with a 64-bit OS, it is quite conceivable that a 4GB array can be allocated which can handle the entire domain of 32-bit integers. Even larger arrays are conceivable with sufficient memory.

The smallest and largest integers would have to be known before processing, or another linear pass through the data using the minmax algorithm to find out this information would be required.

Optimization B: You could optimize Optimization A further, by using at most 2 bits per integer (One bit indicates presence and the other indicates multiplicity). This would allow for the representation of four integers per byte, extending the array implementation to handle a larger domain of integers for a given amount of available memory. More bit games could be played here to compress the representation further, but they would only support special cases of data coming in and therefore cannot be recommended for the still mostly general case.

Sign up to request clarification or add additional context in comments.

13 Comments

hash is average O(1), but only if you set it up properly.
@yi_H, since you are given an array of integers per the problem definition, meaning you know the size of the problem set, you can presize the hash table properly to have a good final load factor.
@Michael, how to make sure that the number you find is the "first" single int in the given array ? Your solution ignores index. thanks
@user1002288, you're missing Step 2 which identifies the "first" such number!
@user1002288: So when you insert, you store a tuple (index of elemented inserted, count := 0), and then later you update the count portion of the tuple. And then iterate the hash. This doesn't change the algorithmic complexity. Or, as Michael suggested, you can iterate the whole array again.
|
1

All this for no reason. Just using 2 for-loops & a variable would give you a simple O(n^2) algo.

If you are taking all the trouble of using a hash map, then it might as well be what @Micheal Goldshteyn suggests

UPDATE: I know this question is 1 year old. But was looking through the questions I answered and came across this. Thought there is a better solution than using a hashtable.

When we say unique, we will have a pattern. Eg: [5, 5, 66, 66, 7, 1, 1, 77]. In this lets have moving window of 3. first consider (5,5,66). we can easily estab. that there is duplicate here. So move the window by 1 element so we get (5,66,66). Same here. move to next (66,66,7). Again dups here. next (66,7,1). No dups here! take the middle element as this has to be the first unique in the set. The left element belongs to the dup so could 1. Hence 7 is the first unique element.

space: O(1) time: O(n) * O(m^2) = O(n) * 9 ≈ O(n)

7 Comments

I like this one, int simple and will most likely be extremely fast.
O(N^2) is extremely fast compared to O(N)?
No, but the benefits of increased locality, and low coefficients can outweigh the cost of complexity. Especially if the inner loop is replaced with memmem(), this simple solution has the potential to be fast.
It is also in place, specifically not requiring any heap allocations. Unless I knew I were going to be using very large data sets, I would be inclined to start here.
Yes that was my first thought. Start with easy to implement solutions with descent performance & then once the entire program is done then optimise it like crazy...
|
0

Inserting to a map is O(log n) not O(n log n) so inserting n keys will be n log n. also its better to use set.

4 Comments

Since when is O(N log N) better than O(N)?
@MichaelGoldshteyn: its better than O(n^2 log n) the OP thought it to be, not O(n)
@Dani, each time you insert map, the keys will be sorted O(n lg n), so if you insert m times, it is O(m n lgn), n is the current size the map, m is the total numbers that need to be inserted.
@user1002288: no they will not be sorted. the map is actually a self balancing RB tree that does binary search for insertion, it doesn't just throws it in and sorts.
0

Although it's O(n^2), the following has small coefficients, isn't too bad on the cache, and uses memmem() which is fast.

 for(int x=0;x<len-1;x++)
     if(memmem(&array[x+1], sizeof(int)*(len-(x+1)), array[x], sizeof(int))==NULL &&
        memmem(&array[x+1], sizeof(int)*(x-1), array[x], sizeof(int))==NULL)
            return array[x];

5 Comments

I believe memmem is part of the POSIX standard, but neither C nor C++ have formally adopted it.
This also has a slight bug. Consider the input [0 0 2]... Your first step checks [0 2] for a 0, which it finds. The second step will check [2] for 0, but not find it, making array[1] look like the answer
What second step? When memmem sees the 0 in [0 2] it returns non-null, and I return array[0]
Oh, so it does. Then its a major bug... you want the first unique element, and array[0] is clearly non-unique.
Aah! I see the bug. *fix fix*
0
public static string firstUnique(int[] input)  
{
    int size = input.Length;
    bool[] dupIndex = new bool[size];
    for (int i = 0; i < size; ++i) 
    {
        if (dupIndex[i]) 
        {
            continue;
        } 
        else if (i == size - 1) 
        {
            return input[i].ToString();
        }
        for (int j = i + 1; j < size; ++j) 
        {
            if (input[i]==input[j]) 
            {
                dupIndex[j] = true;
                break;
            } 
            else if (j == size - 1)
            {
                return input[i].ToString();
            }
        }
    }
    return "No unique element";
}

Comments

0

@user3612419

  Solution given you is good with some what close to O(N*N2) but further optimization in same code is possible I just added two-3 lines that you missed.



public static string firstUnique(int[] input)  
{
  int size = input.Length;
  bool[] dupIndex = new bool[size];
  for (int i = 0; i < size; ++i) 
  {
     if (dupIndex[i]) 
    {
        continue;
    } 
    else if (i == size - 1) 
    {
        return input[i].ToString();
    }
    for (int j = i + 1; j < size; ++j) 
    {
  if(dupIndex[j]==true)
  {
 continue;
 }
        if (input[i]==input[j]) 
        {
            dupIndex[j] = true;
    dupIndex[i] = true;
            break;
        } 
        else if (j == size - 1)
        {
            return input[i].ToString();
         }
     } 
  }
return "No unique element";
 }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.