Hash Function not strictly dependent on size of string key

Question

I am making a hash table and I need to make a hash function that is not only dependent on the size of the string key, as the periodic table elements have only 1 to 3 characters. How can I make a hash function that gives me an index perhaps based of the bytes of each char of the string?

A hash function for strings of characters must clearly depend on the values of those characters, otherwise it's not a hash function at all. — Alexey Frunze
– Alexey Frunze, Commented Apr 9, 2013 at 16:22
Wait a minute, what hash function do you have now? If you Google for string hash functions they'll all take the bytes of the string into account. I've never seen one that didn't look at the contents of the string. — John Kugelman
– John Kugelman, Commented Apr 9, 2013 at 16:22
If you're hashing the elements of the period table (or other fixed input), you should be able to come up with a perfect hash easily by using a tool like gperf. — Nik Bougalis
– Nik Bougalis, Commented Apr 9, 2013 at 16:25
@AlexeyFrunze Otherwise, it's not a good hash function. size_t hash( std::string const& ) { return 0; } is a hash function. A very, very bad one, but still, it will work with all of the standard unordered containers. — James Kanze
– James Kanze, Commented Apr 9, 2013 at 16:43

templatetypedef · Accepted Answer · 2013-04-09 16:23:49Z

5

Pretty much every hash function on strings hashes the characters; it's extremely rare to see strings hashed purely by their lengths.

One simple family of hash functions is shift-add-XOR, which as the name implies uses a combination of bitshifts, additions, and XORs to obtain a hash function from a string. It's easy to implement and gives a reasonably good distribution of keys.

That said, if you are guaranteed that you're just using periodic table symbols, you might want to try to find a perfect hash function for the elements. This is a hash function custom-built for the set of data that you're using and never has any collisions. Tools like gperf can be used to create such functions.

Hope this helps!

answered Apr 9, 2013 at 16:23

templatetypedef

376k112 gold badges954 silver badges1.1k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SirRupertIII Over a year ago

Awesome thanks, I obviously need help on understanding this. This helps!

James Kanze Over a year ago

You might want to look into FNV hashing as well.

James Kanze · Accepted Answer · 2013-04-09 17:06:52Z

The simplest solution is to use an existing one, like FNV. Be careful, however—some very widespread hash functions perform poorly when given a lot of very short strings (the one java.lang.String uses, for example). For a generic hash function, I generally use something like:

size_t
hash( std::string const& value )
{
    size_t result = 2166136261;
    for ( std::string::const_iterator current = value.begin();
            current != value.end();
            ++ current ) {
        result = 127 * result + static_cast< unsigned char >( *current );
    }
    return result;
}

On machines with slow multiplication, this is slightly faster than FNV, and I've yet to find a case where the distribution was significantly poorer.

You mention that the maximum string length is three, however. In this case, you can probably use an even simpler technique:

size_t
hash( std::string const& value )
{
    union {
        size_t results;
        char input[ sizeof( size_t ) ];
    } working = 0;
    assert( value.size() <= sizeof( size_t ) );
    value.copy( working.input, sizeof( size_t ) );
    return working.results;
}

Both of these guarantee unique hash values for all strings of printable ASCII characters with a length of less than sizeof( size_t ).

Collectives™ on Stack Overflow

Hash Function not strictly dependent on size of string key

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related