I am making a hash table and I need to make a hash function that is not only dependent on the size of the string key, as the periodic table elements have only 1 to 3 characters. How can I make a hash function that gives me an index perhaps based of the bytes of each char of the string?
2 Answers
Pretty much every hash function on strings hashes the characters; it's extremely rare to see strings hashed purely by their lengths.
One simple family of hash functions is shift-add-XOR, which as the name implies uses a combination of bitshifts, additions, and XORs to obtain a hash function from a string. It's easy to implement and gives a reasonably good distribution of keys.
That said, if you are guaranteed that you're just using periodic table symbols, you might want to try to find a perfect hash function for the elements. This is a hash function custom-built for the set of data that you're using and never has any collisions. Tools like gperf can be used to create such functions.
Hope this helps!
2 Comments
The simplest solution is to use an existing one, like FNV. Be careful, however—some very widespread hash functions perform poorly when given a lot of very short strings (the one java.lang.String uses, for example). For a generic hash function, I generally use something like:
size_t
hash( std::string const& value )
{
size_t result = 2166136261;
for ( std::string::const_iterator current = value.begin();
current != value.end();
++ current ) {
result = 127 * result + static_cast< unsigned char >( *current );
}
return result;
}
On machines with slow multiplication, this is slightly faster than FNV, and I've yet to find a case where the distribution was significantly poorer.
You mention that the maximum string length is three, however. In this case, you can probably use an even simpler technique:
size_t
hash( std::string const& value )
{
union {
size_t results;
char input[ sizeof( size_t ) ];
} working = 0;
assert( value.size() <= sizeof( size_t ) );
value.copy( working.input, sizeof( size_t ) );
return working.results;
}
Both of these guarantee unique hash values for all strings of
printable ASCII characters with a length of less than sizeof(
size_t ).
0would work,rand()probably not :)size_t hash( std::string const& ) { return 0; }is a hash function. A very, very bad one, but still, it will work with all of the standard unordered containers.