0

I'm trying to teach myself (re-learn) C++ and doing problems from books and tests online to get some practice. I came across this problem which has left me a little confused. How would I best go about it?

I have to write a function

class Solution { public int distinct (int [] A); }

that returns the number of distinct values in the array A. I can assume that the array range is 0 to 100,000. And that the elements are all integers which are + or - 1,000,000. Any ideas? I was thinking of looping through and counting up for each value but that's probably really inefficient right? Thanks in advance.

4
  • 9
    Are you sure you have to write it in C++? That line you posted isn't valid C++. Looks like either C# or Java. Commented Oct 30, 2011 at 22:15
  • Sorry, you're right. It is in Java. I should have made that clear (sleep deprived brain!). The question is still the same though, I want to do it in C++. How do I go about it? Commented Oct 30, 2011 at 22:19
  • I think your proposed solution makes sense. If you are just looping through some numbers it really is not going to take long. I assume you are going to just use a counter and increment it each time you encounter the number right? Commented Oct 30, 2011 at 22:22
  • The fastest way is to use www.google.com with instant searching, with the appropriate words which will lead you to this question : stackoverflow.com/questions/7136279/… Commented Oct 30, 2011 at 22:26

5 Answers 5

6

Your solution is reasonably efficient (in fact, about as efficient as possible, in terms of time complexity), but in space -- to count the values, you need an array sized to the range of the possible values, so to count the instances in your array of 100,000 items you need an auxiliary array of ~2,000,000 items (covering the range from -1,000,000 to 1,000,000).

You have a couple of ways to avoid/reduce that. One is to just store one bit for each possible input, and set the bit when you see that input. This has the same basic complexity, but reduces the space for the count to the minimum necessary (i.e., you don't really care how many times any input has occurred, only whether it occurred or not). In C++, the obvious way to do this would be std::vector<bool>. While often maligned, in this case, vector<bool> does exactly what you're looking for.

Another possibility would be to use a sparse mapping from the input numbers to the count/bit. Especially when your range is much larger than the number of inputs, this could save quite a bit of space (the space taken will be proportional to the number of inputs, not the range). In C++, the obvious way to do this would be std::set<int>. To maintain the same expected complexity (O(N) instead of O(N log N), you'd want to use an unordered_set instead.

Another possibility is to sort the inputs, then eliminate duplicates. This generally keeps auxiliary storage to a minimum, but generally requires slightly longer to execute (O(N log N) instead of O(N)). For this, you'd probably use std::vector, std::sort, and std::unique.

Sign up to request clarification or add additional context in comments.

5 Comments

Isn't vector<bool> deprecated?
Also you could use unordered_map, as yet another option.
@SethCarnegie: At least AFAIK, no, it's not officially deprecated. I believe the wording has been changed in C++11, to just make bit-wise storage suggested instead of required though.
Vector-bool is maligned because it doesn't act like a proper vector in all cases, none of which matter for setting and testing specific elements of it. And being deprecated just means it may not be in the next standard (C++20 at the earliest) and may not work in compilers that decide to pedantically enforce the rule with no possibility of relaxing for backwards compatibility. No, I think we're quite safe with vector-bool for a while yet :-)
I implemented the 'inplace sort, then unique' version now too in my answer
5

Edit Updated: included a space-optimized algorithm as well just for fun

You can use a std::set to contain the unique values. Just copy the array elements into a set (anyway you like), and count the number of unique elements from the set afterwards.

Here is a rather succinct bit of code that doesn't require you to even specify the size of the array (though, normally in c++ you'd be using a std::vector anyway):

See it live on http://ideone.com/rpWGS (which contains test data and output)

#include <set>

class Solution 
{ 
   public: 

     // using std::set (max O(n) additional storage)
     template<size_t N>
         static size_t distinct (int (&a)[N])
     {
         return std::set<int>(a, a+N).size();
     }

     // using std::unique (inplace mutation; no additional storage)
     template<size_t N> 
         static size_t distinct_optim(int (&a)[N])
     {
         std::sort(a, a+N);
         int* newend = std::unique(a, a+N);
         return newend - a; 
     }

};

8 Comments

This doesn't look like an explanatory and easy answer for such question of a beginner, IMHO.
@KirilKirov: I clarified the answer a bit now, and there is a live demo here ideone.com/rpWGS
There's probaby a school of thought that this won't teach much C++ programming :-) However, it's still a good answer since part of learning a language is knowing what it provides as part of the baseline, so you don't have to re-implement it.
I didn't look very hard, but does the test data on IdeOne have duplicates? If not, you might want to add some, to show it works even then.
@SethCarnegie: yes it does. The output explicit shows that by printing 136 DEBUG: size of array: 158 elements, so: 22 were duplicates <grin/>
|
2

Sort the array A. Then go through the sorted array and count the number of times the difference between two consecutive numbers is non zero. Make sure you take care of the edges of the array and cases where the array is of size 1.

Comments

1

I can think of two options:

1) sort the vector using quick sort or merge sort, and then iterate over the sorted vector, counting up each time you encounter a value different from current value.

2) set up a std::vector<bool> of size 1,000,000 and put in true values as you iterate over your array. afterwards you count the number of true values. I say vector<bool> because it's optimized for efficient storage, i.e. it probably stores 8 bools in a byte.

2 Comments

vector<bool> is deprecated.
Apparently, since a while ago, but I wasn't aware of it until Seth pointed it out: link
1

To get the number of distinct values in an array, I can see two possibilities.

The first is to sort them and then count the number of transitions (adding one). For example, the folloing list:

1 1 1 1 2 2 3 4 4 5
       ^   ^ ^   ^

has four transitions, hence five distint values.

The other possibility is to set up an array of "booleans" indicating whether a number had been seen before, pseudocode such as (in your case):

def countDistinct (array):
    def notSeenYet[-1,000,000..1,000,000] as all true
    count = 0
    for each value in array:
        if notSeenYet[value]:
            notSeenYet[value] = false
            count = count + 1
    return count

The first requires a sort which would be at best O(n log n) time complexity. This is unlikely to be a serious problem for 100,000 elements but you may not want the array modified in any way (which would require a copy, O(n) space complexity).

The second is O(n) time complexity and constant storage for your case. Two million boolean values may be of concern, depending on your environment but, if it's available, that would be better, assuming that time is your main concern (and it usually is).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.