0

I have created Search algorithm to search any string in the Text.I just want the count of matches returned and want to search by only one key. I have tested the performance by searching single character 'a' in file of size 2 mb and it takes around 7 seconds.Can you please suggest better algorithm or if i am missing anything here.

   public int SearchText(string fromString,string searchText,bool isCaseSensitive)
       {
           int searchTextLength=searchText.Length;
           if (!isCaseSensitive)
           {
               fromString = fromString.ToLower();
               searchText = searchText.ToLower();
           }
            int matchCount=0;
            while (fromString.Length > searchText.Length)
            {
                int firstMatchIndex=fromString.IndexOf(searchText[0]);
                if (firstMatchIndex > -1)
                {
                    if (searchText == fromString.Substring(firstMatchIndex, searchTextLength))
                        matchCount++;
                    fromString = fromString.Remove(0, firstMatchIndex + searchTextLength);
                }
                else
                    break;
            }
            return matchCount;
       }
2
  • Why didnt you used string.contains() ... ????? Commented Dec 11, 2013 at 5:16
  • Because i needed the count also Commented Dec 11, 2013 at 5:37

2 Answers 2

1

You are creating unnecessary temporary strings all over the place. You can change it to something like this.. which should be faster:

public int SearchText(string fromString, string searchText, bool isCaseSensitive) {
    int matchCount = 0;

    var comparison = isCaseSensitive
        ? StringComparison.InvariantCulture
        : StringComparison.InvariantCultureIgnoreCase;

    int foundIndex = fromString.IndexOf(searchText,
        comparison);

    while (foundIndex > -1) {
        foundIndex = fromString.IndexOf(searchText,
            foundIndex + 1,
            comparison);

        matchCount++;
    }

    return matchCount;
}

EDIT: I tested this. It takes 197ms to process 2MB of randomized data.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks you!!You are right. I have checked your code with same input and indeed it takes less than a second.I should have just checked full text using IndexOf and not one character first. I think i was lost somewhere while writing the code.
1

Try using regular expressions

public int SearchText(string fromString, string searchText, bool isCaseSensitive)
{
        RegexOptions options = isCaseSensitive ? 
            RegexOptions.None : RegexOptions.IgnoreCase;
        return Regex.Matches(fromString, Regex.Escape(searchText), options).Count;
}

EDIT I tested this in LinqPad, and it takes 113 ms to get the count from 2.5 MB of Lorem Ipsum.

2 Comments

Remember to escape the input, otherwise it may produce unexpected output.
Thank you for your alternate approach. I always tend to overlook Regex's power. TEsted it too and it take around 60-80ms similar to that of approach suggested by simon which in range of (50-70 ms). Tested by executing both the approaches 10 times.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.