2

I have to count string occurences in one very loooong string (about 30mb in plain text) I use the following code for now: int count = new Regex(Regex.Escape(stringThatIamLookingFor)).Matches(stringToSearchIn).Count; but is is too slow. It takes about 3 minutes on i7 and 16gb ram. The example data is:

43.996442,-31.768039
43.996432,-31.768039
43.996432,-31.768049
43.996422,-31.768049
43.996422,-31.768059

I want to count (for example) .7 Is there faster way than regeex?

ok, solved

The fastest function so far is: (I need to check only two chars.)

public int countOccurences2(string source, char charOne, char charTwo)
    {
        int count = 0;
        for (int i=0; i<=source.Length-1; i=i+2)
            if (source[i] == charOne && source[i + 1] == charTwo) { count++; }
        return count;
    }
11
  • 1
    How long does it take to convert the string into a byte array and just looking for the correct two numbers with a simple for loop? Sometimes the low tech method isn't the worst. Commented Mar 13, 2015 at 17:08
  • 1
    Knuth–Morris–Pratt algorithm Commented Mar 13, 2015 at 17:11
  • 1
    possible duplicate of How would you count occurrences of a string within a string? Commented Mar 13, 2015 at 17:16
  • please let us know how well perform each solution. For science! :D Commented Mar 13, 2015 at 17:16
  • HOLY CRAP, Simone Riboldi's answer is about 10 times faster than regex. Commented Mar 13, 2015 at 17:18

1 Answer 1

3

from this question: How would you count occurrences of a string within a string?

the following code seems the most performing:

int count = 0, n = 0;

if(substring != "")
{
    while ((n = source.IndexOf(substring, n, StringComparison.InvariantCulture)) != -1)
    {
        n += substring.Length;
        ++count;
    }
}

solution provided by Richard Watson in the mentioned question

Sign up to request clarification or add additional context in comments.

6 Comments

Note that the accepted answer and actually most answers are plain wrong as they count characters; also note that any Linq solution breaks down for large strings! This is the worst example of vote gauging going wrong I have ever seen on SO
Yes, probably because the needle is so short. I did tests last year and when both needle and haystack grow regex finally wins.
@TaW good to know, very good to know! We always need this kind of function and it is really usefull to know the weakness of each approach.
I might also add that due to Unicode considerations, any comparison based on simple byte comparisons will give the incorrect answer on some data, so using a method like this that respects Unicode is a definite plus.
I think this answer is (arguably) wrong due to the "n += substring.Length" line - if you search for "abca" in the string "abcabca" then it will only return 1 when (arguably) it should return 2.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.