I am calculating a hash of a text string in Java and C#, requirement being that if the text strings are identical the hash is the same.
I settled on Java's .hashValue() as it is quite simple and straight forward(and I am fault tolerant to a potential collision), - or so I thought.
My C# implementation turns out to be unbearably slow.
Here is the implementation in c# (java is almost identical) :
char[] val = string.ToCharArray();
int hash = 0;
for (int i = 0; i < string.Count(); i++) {
hash = 31 * hash + val[i];
}
Now I pass in two text strings, both read from text files on disc (C#, System.IO.File.ReadAllText), the fist is 10kb the second is 100kb
java zips right by both of them and generates the result. C# takes about 600ms for the 10kb file and then a whooping 50 seconds for the latter. In essense, the C# version does not scale linearly, and at a certain size it becomes a not-feasible approach. Given the exponential scaling, and that i cant fanthom ADD and MUL begins to take more time, it leads me to believe it has to be some memory management that goes haywire with C# indexing the char array. Is this expected behavior ... or what am I missing? :-)
Best regards.
val.Lengthsince the count method might actually by counting the string each time?