4

We have the following code:

int i = 1;
Console.WriteLine(i.GetHashCode());  // outputs => 1

This make sense and the same happen whit all integral types in C# except sbyte and short. That is:

sbyte i = 1;
Console.WriteLine(i.GetHashCode());   //  outputs => 257

Why is this?

7
  • 7
    Why not? what's wrong with 257? Commented Sep 19, 2012 at 20:05
  • @ZdeslavVojkovic "Why is it not 1, as with int.GetHashCode?" :) Commented Sep 19, 2012 at 20:08
  • Because it is different type. Is there any reason why it should be 1? Commented Sep 19, 2012 at 20:10
  • @ZdeslavVojkovic "But it is also an integer type. So why is the GetHashCode implementation not uniform?" (Looking at it another way, why is int.GetHashCode so boring?) Commented Sep 19, 2012 at 20:11
  • 1
    @ZdeslavVojkovic yes we know that the result of GetHashCode is unspecified. But I don't see why (int)this ^ ((int)this << 8) is better than (int)this. So why do the extra work? (I know it's negligible compared to the cost of the remainder operator) Commented Sep 19, 2012 at 20:21

3 Answers 3

4

Because the source of that method (SByte.GetHashCode) is

public override int GetHashCode()
{
    return (int)this ^ ((int)this << 8);
}

As for why, well someone at Microsoft knows that..

Sign up to request clarification or add additional context in comments.

12 Comments

It would distribute the values in the sbyte domain more throughout the integer range (actually, only to ~32k) .. not sure about C#, but Java HashMap (etc), already use a re-distribution function ..
I think the .net hashtables simply mask away the sign, and then calculate the remainder. So I don't really see what this implementation gains.
@CodesInChaos Well, if pst is correct it would result in the items in the hash table being more evenly distributed (if there is a small range of numbers and a large hash table size) which means less collisions which means better efficiency.
@Servy At least for positive numbers, multiplying with 257 has no effect on the bin distribution at all. 257 is prime, and thus co-prime to any table size except 257. The only interesting effect this has is how it maps negative numbers to positive numbers. But I don't know if that's an advantage over masking away the sign.
What I also find curious is that, suppose this helps, why isn't it done for the unsigned byte as well?
|
1

Yes it's all about values distribution. As the GetHashCode method return type is int for the type sbyte the values are going to be distributed in intervals of 257. For this same reason for the long type will be colisions.

Comments

0

The reason is that it is probably done to avoid clustering of hash values.

As GetHashCode documentation says:

For the best performance, a hash function must generate a random distribution for all input. Providing a good hash function on a class can significantly affect the performance of adding those objects to a hash table. In a hash table with a good implementation of a hash function, searching for an element takes constant time (for example, an O(1) operation).

Also, as this excellent article explains:

Guideline: the distribution of hash codes must be "random" By a "random distribution" I mean that if there are commonalities in the objects being hashed, there should not be similar commonalities in the hash codes produced. Suppose for example you are hashing an object that represents the latitude and longitude of a point. A set of such locations is highly likely to be "clustered"; odds are good that your set of locations is, say, mostly houses in the same city, or mostly valves in the same oil field, or whatever. If clustered data produces clustered hash values then that might decrease the number of buckets used and cause a performance problem when the bucket gets really big.

3 Comments

But can you explain why it improves the distribution?
I am not a mathematician, so I wouldn't be that courageous :). However, I suspect that it is related to the implementation of HashTable and Dictionary classes, default bucket settings, etc.
I would also expect it to behave better if you have an sbyte field in your class and use its hash as a part of your class hash function (the often use h = h * 23 + field.GetHashCode() idiom).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.