Count distinct values in an array - C++

Question

I'm trying to teach myself (re-learn) C++ and doing problems from books and tests online to get some practice. I came across this problem which has left me a little confused. How would I best go about it?

I have to write a function

class Solution { public int distinct (int [] A); }

that returns the number of distinct values in the array A. I can assume that the array range is 0 to 100,000. And that the elements are all integers which are + or - 1,000,000. Any ideas? I was thinking of looping through and counting up for each value but that's probably really inefficient right? Thanks in advance.

Are you sure you have to write it in C++? That line you posted isn't valid C++. Looks like either C# or Java. — Seth Carnegie
– Seth Carnegie, Commented Oct 30, 2011 at 22:15
Sorry, you're right. It is in Java. I should have made that clear (sleep deprived brain!). The question is still the same though, I want to do it in C++. How do I go about it? — Peter
– Peter, Commented Oct 30, 2011 at 22:19
I think your proposed solution makes sense. If you are just looping through some numbers it really is not going to take long. I assume you are going to just use a counter and increment it each time you encounter the number right? — ihtkwot
– ihtkwot, Commented Oct 30, 2011 at 22:22
The fastest way is to use www.google.com with instant searching, with the appropriate words which will lead you to this question : stackoverflow.com/questions/7136279/… — FailedDev
– FailedDev, Commented Oct 30, 2011 at 22:26

Jerry Coffin · Accepted Answer · 2011-10-30 22:26:20Z

6

Your solution is reasonably efficient (in fact, about as efficient as possible, in terms of time complexity), but in space -- to count the values, you need an array sized to the range of the possible values, so to count the instances in your array of 100,000 items you need an auxiliary array of ~2,000,000 items (covering the range from -1,000,000 to 1,000,000).

You have a couple of ways to avoid/reduce that. One is to just store one bit for each possible input, and set the bit when you see that input. This has the same basic complexity, but reduces the space for the count to the minimum necessary (i.e., you don't really care how many times any input has occurred, only whether it occurred or not). In C++, the obvious way to do this would be std::vector<bool>. While often maligned, in this case, vector<bool> does exactly what you're looking for.

Another possibility would be to use a sparse mapping from the input numbers to the count/bit. Especially when your range is much larger than the number of inputs, this could save quite a bit of space (the space taken will be proportional to the number of inputs, not the range). In C++, the obvious way to do this would be std::set<int>. To maintain the same expected complexity (O(N) instead of O(N log N), you'd want to use an unordered_set instead.

Another possibility is to sort the inputs, then eliminate duplicates. This generally keeps auxiliary storage to a minimum, but generally requires slightly longer to execute (O(N log N) instead of O(N)). For this, you'd probably use std::vector, std::sort, and std::unique.

answered Oct 30, 2011 at 22:26

Jerry Coffin

494k83 gold badges656 silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Seth Carnegie Over a year ago

Isn't vector<bool> deprecated?

Seth Carnegie Over a year ago

Also you could use unordered_map, as yet another option.

Jerry Coffin Over a year ago

@SethCarnegie: At least AFAIK, no, it's not officially deprecated. I believe the wording has been changed in C++11, to just make bit-wise storage suggested instead of required though.

paxdiablo Over a year ago

Vector-bool is maligned because it doesn't act like a proper vector in all cases, none of which matter for setting and testing specific elements of it. And being deprecated just means it may not be in the next standard (C++20 at the earliest) and may not work in compilers that decide to pedantically enforce the rule with no possibility of relaxing for backwards compatibility. No, I think we're quite safe with vector-bool for a while yet :-)

sehe Over a year ago

I implemented the 'inplace sort, then unique' version now too in my answer

sehe · Accepted Answer · 2011-10-30 23:28:32Z

5

Edit Updated: included a space-optimized algorithm as well just for fun

You can use a std::set to contain the unique values. Just copy the array elements into a set (anyway you like), and count the number of unique elements from the set afterwards.

Here is a rather succinct bit of code that doesn't require you to even specify the size of the array (though, normally in c++ you'd be using a std::vector anyway):

See it live on http://ideone.com/rpWGS (which contains test data and output)

#include <set>

class Solution 
{ 
   public: 

     // using std::set (max O(n) additional storage)
     template<size_t N>
         static size_t distinct (int (&a)[N])
     {
         return std::set<int>(a, a+N).size();
     }

     // using std::unique (inplace mutation; no additional storage)
     template<size_t N> 
         static size_t distinct_optim(int (&a)[N])
     {
         std::sort(a, a+N);
         int* newend = std::unique(a, a+N);
         return newend - a; 
     }

};

edited Oct 30, 2011 at 23:28

answered Oct 30, 2011 at 22:24

sehe

400k49 gold badges475 silver badges673 bronze badges

8 Comments

Kiril Kirov Over a year ago

This doesn't look like an explanatory and easy answer for such question of a beginner, IMHO.

sehe Over a year ago

@KirilKirov: I clarified the answer a bit now, and there is a live demo here ideone.com/rpWGS

paxdiablo Over a year ago

There's probaby a school of thought that this won't teach much C++ programming :-) However, it's still a good answer since part of learning a language is knowing what it provides as part of the baseline, so you don't have to re-implement it.

Seth Carnegie Over a year ago

I didn't look very hard, but does the test data on IdeOne have duplicates? If not, you might want to add some, to show it works even then.

sehe Over a year ago

@SethCarnegie: yes it does. The output explicit shows that by printing 136 DEBUG: size of array: 158 elements, so: 22 were duplicates <grin/>

|

twerdster · Accepted Answer · 2011-10-30 22:18:32Z

2

Sort the array A. Then go through the sorted array and count the number of times the difference between two consecutive numbers is non zero. Make sure you take care of the edges of the array and cases where the array is of size 1.

answered Oct 30, 2011 at 22:18

twerdster

5,0234 gold badges43 silver badges72 bronze badges

Comments

Vlad · Accepted Answer · 2011-10-30 22:20:33Z

1

I can think of two options:

1) sort the vector using quick sort or merge sort, and then iterate over the sorted vector, counting up each time you encounter a value different from current value.

2) set up a std::vector<bool> of size 1,000,000 and put in true values as you iterate over your array. afterwards you count the number of true values. I say vector<bool> because it's optimized for efficient storage, i.e. it probably stores 8 bools in a byte.

answered Oct 30, 2011 at 22:20

Vlad

18.8k4 gold badges44 silver badges72 bronze badges

2 Comments

Seth Carnegie Over a year ago

vector<bool> is deprecated.

Vlad Over a year ago

Apparently, since a while ago, but I wasn't aware of it until Seth pointed it out: link

paxdiablo · Accepted Answer · 2011-10-30 22:31:12Z

To get the number of distinct values in an array, I can see two possibilities.

The first is to sort them and then count the number of transitions (adding one). For example, the folloing list:

1 1 1 1 2 2 3 4 4 5
       ^   ^ ^   ^

has four transitions, hence five distint values.

The other possibility is to set up an array of "booleans" indicating whether a number had been seen before, pseudocode such as (in your case):

def countDistinct (array):
    def notSeenYet[-1,000,000..1,000,000] as all true
    count = 0
    for each value in array:
        if notSeenYet[value]:
            notSeenYet[value] = false
            count = count + 1
    return count

The first requires a sort which would be at best O(n log n) time complexity. This is unlikely to be a serious problem for 100,000 elements but you may not want the array modified in any way (which would require a copy, O(n) space complexity).

The second is O(n) time complexity and constant storage for your case. Two million boolean values may be of concern, depending on your environment but, if it's available, that would be better, assuming that time is your main concern (and it usually is).

Collectives™ on Stack Overflow

Count distinct values in an array - C++

5 Answers 5

5 Comments

8 Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

8 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related