I'm confused with java interview question about which hashcode implementation is better. We have a class Point {int x, y; }. Why does implementation of hashcode 31 * x + y for this class is better than x + y ? The right answer is "The multiplier creates a dependence of the hashcode value on the order of processing the fields, which ultimately gives rise to a better hash function". But I can't understand why the order of processing is the point here because the entire expression 31 * x + y is calculating when I executes point1.equals(point2); And there is no matter in which order it happens. Am I wrong?
3 Answers
If you use x+y then how to distinguish points (3,4) and (4,3)? Both will have the same hashcode...
Now while 31 * x + y will not be perfect, in the same case, it will be much much better.
Note: by definition of hashing there is no perfect hashing. The only thing is to analyse what kind of collisions occurs for a given hash function. In the geometrical case the first one introduces collisions for a very simple and usual symmetry property. Thus in very common cases there could be too much collisions.
----EDIT----
Note that 31x+y ensures that no points in the rectangle (0,0)—(huge value,30) will collide, etc. A point (x,y) will collide only with points (x',y') such that (31x+y=31x'+y') which is a very uncommon geometrical property.
6 Comments
31 * x + 37 * yImagine you have two string properties prop1 and prop2, and two objects:
A: {prop1="foo", prop2="bar"}
B: {prop1="bar", prop2="foo"}
These are clearly different values, and it's useful to set up the hash code to distinguish between them. If you simply add the properties' hash codes together, you'll get the same value for both A and B. Instead, by multiplying and adding, the hash code will be different based on the property sequence.
It seems you may be slightly misinterpreting the advice: The purpose of the multiply-and-add is to create a dependence on the semantic order of properties within an object, not the execution order of the calculation.
1 Comment
Jean-Baptiste Yunès’ answer is the correct one, but I will add the following example to illustrate (keep in mind that it's in javascript, only because I quickly implemented this for the example):
class Point {
constructor(x, y) {
this.x = x;
this.y = y;
}
}
function getHashCollisions(collection, hashFunction) {
const collisionMap = new Map();
let count = 1;
let total = collection.length;
for (const point of collection) {
console.log(`calculating ${count++}/${total}`);
const currentHash = hashFunction(point);
const hashCount = collisionMap.has(currentHash) ? collisionMap.get(currentHash) +1 : 1;
collisionMap.set(currentHash, hashCount);
}
return collisionMap;
}
function generateDataset(rangeX, rangeY) {
const points = [];
let count = 1;
for (let x = 0; x < rangeX; x++) {
for (let y = 0; y < rangeY; y++) {
console.log(`generating ${count++} Point(${x};${y})`);
points.push(new Point(x, y));
}
}
return points;
}
function calculateAndGenerateReport(dataset, hashFunction, hashFunctionName) {
const hashes = getHashCollisions(dataset, hashFunction);
const totalCollisions = Array.from(hashes.values()).filter(currentCollisionCount => currentCollisionCount > 1).length;
const highestCollisionCount = Array.from(hashes.values()).reduce((currentHighest, current) => current > currentHighest ? current : currentHighest) - 1;
return `${hashFunctionName}: ${totalCollisions} collisions, highest collision count: ${highestCollisionCount}`;
}
const dataset = generateDataset(100, 100);
const literalHashesReport = calculateAndGenerateReport(dataset, point => point.x + point.y, "literal hash function:");
const onePrimeHashesReport = calculateAndGenerateReport(dataset, point => 31 * point.x + point.y, "one prime multiplication hash function:");
const twoPrimesHashesReport = calculateAndGenerateReport(dataset, point => 31 * point.x + 37 * point.y, "two primes multiplication hash function:");
const twoLargePrimesHashesReport = calculateAndGenerateReport(dataset, point => 8191 * point.x + 131071 * point.y, "two large primes multiplication hash function:");
console.log(literalHashesReport);
console.log(onePrimeHashesReport);
console.log(twoPrimesHashesReport);
console.log(twoLargePrimesHashesReport)
Result:
literal hash function: 197 collisions, highest collision count: 99
one prime multiplication hash function: 3107 collisions, highest collision count: 3
two primes multiplication hash function: 3359 collisions, highest collision count: 2
two large primes multiplication hash function: 0 collisions, highest collision count: 0
This shows that the (prime) numbers we choose to "calculate" the hash greatly decreases the probability for collisions.
hashCode() {return x + y;}would create redundant hash collision for point (1,2) and point (2,1), etc.hashCode() {return 31 * x + y;}, pairs (0,31) and (1,0) will have a collision as well.x+ywould introduce collisions by symmetry (a very usual property in geometry).