Why do "int" and "sbyte" GetHashCode functions generate different values?

Question

We have the following code:

int i = 1;
Console.WriteLine(i.GetHashCode());  // outputs => 1

This make sense and the same happen whit all integral types in C# except sbyte and short. That is:

sbyte i = 1;
Console.WriteLine(i.GetHashCode());   //  outputs => 257

Why is this?

@ZdeslavVojkovic "Why is it not 1, as with int.GetHashCode?" :) — user166390
– user166390, Commented Sep 19, 2012 at 20:08
Because it is different type. Is there any reason why it should be 1? — Zdeslav Vojkovic
– Zdeslav Vojkovic, Commented Sep 19, 2012 at 20:10
@ZdeslavVojkovic "But it is also an integer type. So why is the GetHashCode implementation not uniform?" (Looking at it another way, why is int.GetHashCode so boring?) — user166390
– user166390, Commented Sep 19, 2012 at 20:11
@ZdeslavVojkovic yes we know that the result of GetHashCode is unspecified. But I don't see why (int)this ^ ((int)this << 8) is better than (int)this. So why do the extra work? (I know it's negligible compared to the cost of the remainder operator) — CodesInChaos
– CodesInChaos, Commented Sep 19, 2012 at 20:21

user555045 · Accepted Answer · 2015-09-30 15:32:11Z

4

Because the source of that method (SByte.GetHashCode) is

public override int GetHashCode()
{
    return (int)this ^ ((int)this << 8);
}

As for why, well someone at Microsoft knows that..

edited Sep 30, 2015 at 15:32

answered Sep 19, 2012 at 20:04

user555045

65.8k6 gold badges95 silver badges181 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

user166390 Over a year ago

It would distribute the values in the sbyte domain more throughout the integer range (actually, only to ~32k) .. not sure about C#, but Java HashMap (etc), already use a re-distribution function ..

CodesInChaos Over a year ago

I think the .net hashtables simply mask away the sign, and then calculate the remainder. So I don't really see what this implementation gains.

Servy Over a year ago

@CodesInChaos Well, if pst is correct it would result in the items in the hash table being more evenly distributed (if there is a small range of numbers and a large hash table size) which means less collisions which means better efficiency.

CodesInChaos Over a year ago

@Servy At least for positive numbers, multiplying with 257 has no effect on the bin distribution at all. 257 is prime, and thus co-prime to any table size except 257. The only interesting effect this has is how it maps negative numbers to positive numbers. But I don't know if that's an advantage over masking away the sign.

user555045 Over a year ago

What I also find curious is that, suppose this helps, why isn't it done for the unsigned byte as well?

|

Eric Javier Hernandez Saura · Accepted Answer · 2012-09-19 20:43:32Z

1

Yes it's all about values distribution. As the GetHashCode method return type is int for the type sbyte the values are going to be distributed in intervals of 257. For this same reason for the long type will be colisions.

answered Sep 19, 2012 at 20:43

Eric Javier Hernandez Saura

4,0113 gold badges32 silver badges38 bronze badges

Comments

Zdeslav Vojkovic · Accepted Answer · 2012-09-19 20:53:49Z

0

The reason is that it is probably done to avoid clustering of hash values.

As GetHashCode documentation says:

For the best performance, a hash function must generate a random distribution for all input. Providing a good hash function on a class can significantly affect the performance of adding those objects to a hash table. In a hash table with a good implementation of a hash function, searching for an element takes constant time (for example, an O(1) operation).

Also, as this excellent article explains:

Guideline: the distribution of hash codes must be "random" By a "random distribution" I mean that if there are commonalities in the objects being hashed, there should not be similar commonalities in the hash codes produced. Suppose for example you are hashing an object that represents the latitude and longitude of a point. A set of such locations is highly likely to be "clustered"; odds are good that your set of locations is, say, mostly houses in the same city, or mostly valves in the same oil field, or whatever. If clustered data produces clustered hash values then that might decrease the number of buckets used and cause a performance problem when the bucket gets really big.

answered Sep 19, 2012 at 20:53

Zdeslav Vojkovic

14.7k37 silver badges45 bronze badges

3 Comments

CodesInChaos Over a year ago

But can you explain why it improves the distribution?

Zdeslav Vojkovic Over a year ago

I am not a mathematician, so I wouldn't be that courageous :). However, I suspect that it is related to the implementation of HashTable and Dictionary classes, default bucket settings, etc.

Zdeslav Vojkovic Over a year ago

I would also expect it to behave better if you have an sbyte field in your class and use its hash as a part of your class hash function (the often use h = h * 23 + field.GetHashCode() idiom).

Collectives™ on Stack Overflow

Why do "int" and "sbyte" GetHashCode functions generate different values?

3 Answers 3

12 Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

12 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related