1

I am looking for a hash-algorithm, to create as close to a unique hash of a string (max len = 255) as possible, that produces a long integer (DWORD).

I realize that 26^255 >> 2^32, but also know that the number of words in the English language is far less than 2^32.

The strings I need to 'hash' would be mostly single words or some simple construct using two or three words.


The answer:

One of the FNV variants should meet your requirements. They're fast, and produce fairly evenly distributed outputs. (Answered by Arachnid)


5 Answers 5

2

See here for a previous iteration of this question (and the answer).

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, that's what I'm looking for. I did search but the keywords I used does not appear in the link you gave. I added a comment there to add more relevant keywords.
1

One technique is to use a well-known hash algorithm (say, MD5 or SHA-1) and use only the first 32 bits of the result.

Be aware that the risk of hash collisions increases faster than you might expect. For information on this, read about the Birthday Paradox.

Comments

1

Ronny Pfannschmidt did a test with common english words yesterday and hasn't encountered any collisions for the 10000 words he tested in the Python string hash function. I haven't tested it myself, but that algorithm is very simple and fast, and seems to be optimized for common words.

Here the implementation:

static long
string_hash(PyStringObject *a)
{
    register Py_ssize_t len;
    register unsigned char *p;
    register long x;

    if (a->ob_shash != -1)
        return a->ob_shash;
    len = Py_SIZE(a);
    p = (unsigned char *) a->ob_sval;
    x = *p << 7;
    while (--len >= 0)
        x = (1000003*x) ^ *p++;
    x ^= Py_SIZE(a);
    if (x == -1)
        x = -2;
    a->ob_shash = x;
    return x;
}

2 Comments

Can we have a link to the article/post you're referencing?
Was a discussion on IRC, no link. sorry
0

H(key) = [GetHash(key) + 1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1))] % hashsize

MSDN article on HashCodes

Comments

0

Java's String.hash() can be easily viewed here, its algorithm is

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.