1

I am making a hash table and I need to make a hash function that is not only dependent on the size of the string key, as the periodic table elements have only 1 to 3 characters. How can I make a hash function that gives me an index perhaps based of the bytes of each char of the string?

9
  • 2
    A hash function for strings of characters must clearly depend on the values of those characters, otherwise it's not a hash function at all. Commented Apr 9, 2013 at 16:22
  • Wait a minute, what hash function do you have now? If you Google for string hash functions they'll all take the bytes of the string into account. I've never seen one that didn't look at the contents of the string. Commented Apr 9, 2013 at 16:22
  • 1
    If you're hashing the elements of the period table (or other fixed input), you should be able to come up with a perfect hash easily by using a tool like gperf. Commented Apr 9, 2013 at 16:25
  • 1
    @AlexeyFrunze 0 would work, rand() probably not :) Commented Apr 9, 2013 at 16:38
  • 1
    @AlexeyFrunze Otherwise, it's not a good hash function. size_t hash( std::string const& ) { return 0; } is a hash function. A very, very bad one, but still, it will work with all of the standard unordered containers. Commented Apr 9, 2013 at 16:43

2 Answers 2

5

Pretty much every hash function on strings hashes the characters; it's extremely rare to see strings hashed purely by their lengths.

One simple family of hash functions is shift-add-XOR, which as the name implies uses a combination of bitshifts, additions, and XORs to obtain a hash function from a string. It's easy to implement and gives a reasonably good distribution of keys.

That said, if you are guaranteed that you're just using periodic table symbols, you might want to try to find a perfect hash function for the elements. This is a hash function custom-built for the set of data that you're using and never has any collisions. Tools like gperf can be used to create such functions.

Hope this helps!

Sign up to request clarification or add additional context in comments.

2 Comments

Awesome thanks, I obviously need help on understanding this. This helps!
You might want to look into FNV hashing as well.
1

The simplest solution is to use an existing one, like FNV. Be careful, however—some very widespread hash functions perform poorly when given a lot of very short strings (the one java.lang.String uses, for example). For a generic hash function, I generally use something like:

size_t
hash( std::string const& value )
{
    size_t result = 2166136261;
    for ( std::string::const_iterator current = value.begin();
            current != value.end();
            ++ current ) {
        result = 127 * result + static_cast< unsigned char >( *current );
    }
    return result;
}

On machines with slow multiplication, this is slightly faster than FNV, and I've yet to find a case where the distribution was significantly poorer.

You mention that the maximum string length is three, however. In this case, you can probably use an even simpler technique:

size_t
hash( std::string const& value )
{
    union {
        size_t results;
        char input[ sizeof( size_t ) ];
    } working = 0;
    assert( value.size() <= sizeof( size_t ) );
    value.copy( working.input, sizeof( size_t ) );
    return working.results;
}

Both of these guarantee unique hash values for all strings of printable ASCII characters with a length of less than sizeof( size_t ).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.