13

I need to specialize the hash function for unordered_map so I can use int arrays as keys. The array values are usually 0 or 1, e.g. int array = {0, 1, 0, 1}, but technically not bounded.

Can someone recommend a good hash function in this case? Alternatively, I can always convert the int array into a string and avoid specialization. But I am concerned about performance since I may have several million of these arrays.

6
  • 2
    Use or mimic Boost's "range hash". It's built up by repeatedly calling hash_combine, which is also in Boost and should really be in the standard. Commented Aug 22, 2011 at 14:02
  • If you do have several million of those arrays, I suggest new algorithms/data structures... Commented Aug 22, 2011 at 14:03
  • @Blindy What data structures would you suggest? Commented Aug 22, 2011 at 14:05
  • @Kerreck,boost.org/doc/libs/1_35_0/doc/html/boost/… says it is not appropriate for unordered containers. Does this not apply in my case? Commented Aug 22, 2011 at 14:10
  • 1
    @gewizz: that is sloppy wording. It is not appropriate to get a deterministic hash of an unordered container as a whole [the ordering could depend on the load factor and number of reallocations done]. However, of course it is appropriate to use as the element hash function to an unordered container Commented Aug 22, 2011 at 14:29

2 Answers 2

7

C++ TR1 contains a hash template function.

If you don't have that yet, you can use Boost Hash.

Idea for a handy helper:

#include <boost/functional/hash.hpp>

template <typename T, int N>
    static std::size_t hasharray(const T (&arr)[N])
{
     return boost::hash_range(arr, arr+N);
}

This would be (roughly?) equivalent to

 size_t seed = 0;
 for (const T* it=arr; it!=(arr+N); ++it)
     boost::hash_combine(seed, *it);
 return seed;

Don't forget to implement proper equality comparison operations if you're using this hash for lookup

Sign up to request clarification or add additional context in comments.

2 Comments

I think it should be std::size_t N because std::size_t is guaranteed to be able to represent the size of the largest possible array, while int might overflow (depending on the system). Plus, it doesn't need to be a signed type.
@outofthecave fair points. However, unsigned is contagious and that makes it unwieldy for offsets (they can be negative, and N - 10 will just wrap around if N<10. Surprise!). Also, arrays statically typed at larger than 2³¹ elements? Those are rare. And you'd not often be hashing them, if you had them.
7

Try to use lookup8 hash function. This function is VERY fast and good.

int key[100];
int key_size=10;
for (int i=0;i<key_size;i++) key[i]=i; //fill key with sample data
ub8 hash=hash((ub8*)key, sizeof(key[0])*key_size, 0);

UPD: Or use better function. - t1ha

4 Comments

Usually hash functions are written in plain c. You can create C++ wrapper for it.
Usually, hash functions are written in the language at hand.
You always write function like crc32, sha, md5 or use existing well tested and hi-performance implementations? :)
The function is too complex to be used at compile time

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.