C# Iterating char array very slow

Question

I am calculating a hash of a text string in Java and C#, requirement being that if the text strings are identical the hash is the same. I settled on Java's .hashValue() as it is quite simple and straight forward(and I am fault tolerant to a potential collision), - or so I thought.
My C# implementation turns out to be unbearably slow.

Here is the implementation in c# (java is almost identical) :

        char[] val = string.ToCharArray();
        int hash = 0;
        for (int i = 0; i < string.Count(); i++) {
            hash = 31 * hash + val[i];
        }

Now I pass in two text strings, both read from text files on disc (C#, System.IO.File.ReadAllText), the fist is 10kb the second is 100kb

java zips right by both of them and generates the result. C# takes about 600ms for the 10kb file and then a whooping 50 seconds for the latter. In essense, the C# version does not scale linearly, and at a certain size it becomes a not-feasible approach. Given the exponential scaling, and that i cant fanthom ADD and MUL begins to take more time, it leads me to believe it has to be some memory management that goes haywire with C# indexing the char array. Is this expected behavior ... or what am I missing? :-)

Best regards.

Have you tried using val.Length since the count method might actually by counting the string each time? — T. Kiley
– T. Kiley, Commented Mar 19, 2014 at 14:21

Rik · Accepted Answer · 2014-03-19 14:39:43Z

7

for (int i = 0; i < string.Count(); i++) {

In this line, you should either use string.Length (no parentheses) or, preferably, val.Length.

Count() is an extension method which gets the length of the string by enumerating it every time you call it.

A more conventional C# implementation of the same algorithm would be:

int hash = 0;
foreach(char c in string)
{
    hash = 31 * hash + c;
}

As pointed out in the comments, string is not a valid variable name is C# since it is a keyword (an alias for System.String), but I kept it here for clarity.

edited Mar 19, 2014 at 14:39

answered Mar 19, 2014 at 14:23

Rik

29.3k14 gold badges52 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Matthew Watson Over a year ago

Indeed - using Count() it became an O(N^2) operation!

user1257043 Over a year ago

Thanks alot, this is it :) .. Consider me wiser. "string" was substituted in my original question, i thought it was transparent.

user1257043 Over a year ago

I wasnt going to, but apparantly I am, since I am here typing now...so here goes ... If String/string is immutable.. why would .Count have to 'count' the length each time? Seems unnecessarily inefficient?

Rik Over a year ago

It is unnecessarily ineffecient, that's why you should use String.Length instead, which is just a property of a string. The .Count()' extension method is defined on any type that implements IEnumerable' (similar to Iterable' in Java) to get the length of a sequence, _without knowing anything particular about the sequence_. String` implements IEnumerable because it's essentially a sequence of chars, so you can use the Count() method, but it has no idea what kind of sequence String actually is, so the only way to determine the sequence's length is to enumerate it until it stops.

user1257043 Over a year ago

Okay, I got that, what I am asking is, would it not be safe to override .Count in string/String and just return length? .. Not trying to cover or justify my original bad-form-code, just curious :)

|

Collectives™ on Stack Overflow

C# Iterating char array very slow

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related