2

I have a little regex that replaces non-printable characters with empty string. (the ones that are not supported in an XML document)

The size of the incoming data is quite large, so if this Replace takes more than a few milliseconds, I'd want to cancel it and return the original string back.

Below is my code, but I can't seem to hit the catch block even if I provide a Timespan of 1ms. Logs from the stopwatch show that it took well over 10 ms.

What am I doing wrong here?

Does this work only if it doesn't find a match within the given timespan?

What's the best way to test this?

Update - I tested the below regex using a large file (4 MB) that hasn't got any Non-printable characters. Regex took 79ms, yet no exception was thrown.

  private static string CleanUpNonPrintableCharacters(string incomingString)
    {
        var stopWatch = new Stopwatch();
        try
        {
            stopWatch.Start();
            var timeSpan = TimeSpan.FromMilliseconds(1);

            var cleanedUpString = Regex.Replace(incomingString, @"[\u0000-\u0008\u000B\u000C\u000E-\u001F]", string.Empty, RegexOptions.None, timeSpan);

            stopWatch.Stop(); 
            Console.Log(stopWatch.ElapsedMilliseconds);
            //Above was 79 ms on a file that doesn't have a match, yet no exception was thrown

            if (cleanedUpString.Length < incomingString.Length)
            {
                //do some logging
            }
            return cleanedUpString;
        }
        catch (RegexMatchTimeoutException ex)
        {
            //do some logging
            return incomingString;
        }
        finally
        {
            //stopWatch.Stop();
            //log elapsed
        }
    }
5
  • I think, your ` stopWatch.Stop();` code should be just after Replace code as it seems that there are other operation before stopwatch stops. Commented Jul 31, 2017 at 15:47
  • Can you use a delegate then track elapsed time. If > limit, might be able to throw inside the delegate. Not sure on this though (if the regex engine unwinds and quits).. Commented Jul 31, 2017 at 15:50
  • True, I was able to work around the above by wrapping the above util method with a Task and issuing a CancellationToken with Timeout, but somehow I feel that this should work out of the box, since Regex.Replace offers the timeout option. @AkashKC, I've updated my question with the results from my most-recent test. Commented Jul 31, 2017 at 16:20
  • Another thing you might try is to set a timer handler with 1 ms timeout, keep a counter, if count > limit, throw, but that could require a different thread. Commented Jul 31, 2017 at 16:20
  • Yeah, if a thread does while(1){} how can it be stopped? Does it have to be killed, probably. So, the regex code is similar, only catching, polling what it wants. Commented Jul 31, 2017 at 16:34

2 Answers 2

4

As from my understanding, timeout has only been used in matching pattern rather than using timeout for replacing matching characters.

If you look into Regex source code, here is the code where timeout has been used to find out match :

  match = runner.Scan(this, input, beginning, beginning + length,
               startat, prevlen, quick, internalMatchTimeout);.// runner is RegexRunner instance

So, Regex.Replace comprises of matching method and replacing method. In your case, matching is very fast so it does not throw RegexMatchTimeoutException exception but replacing method for given match seems to be slow.

I've not tested in my end but you can test with calling only Match method and see the result

Sign up to request clarification or add additional context in comments.

Comments

3

The manual states:

The matchTimeout parameter specifies how long a pattern matching method should try to find a match before it times out. Setting a time-out interval prevents regular expressions that rely on excessive backtracking from appearing to stop responding when they process input that contains near matches.

I assume that it finds the next match faster than 1ms, but there are many matches, so it adds up.

2 Comments

I thought of the same, but I just tested it with a file that contains just 4MB of lorem ipsum text. There weren't any matches and regex took 79ms, yet no exceptions were thrown. I will try Regex.IsMatch and see if that catches anything.
@Ren Been playing a bit with the timeout here. The timeout indeed resets between each match. Some guesses follow... Scanning a 4MB string for a simple char class could be really fast (order of magnitude of ms), and if optimized could be done in O(1) time complexity. Maybe the replace call is doing some optimizations or something first, for which the timeout does not apply. Could the limit be too low? Can you try with larger string or a more complicated regex (with backtracking)? Or maybe if there is no backtracking in the regex the timeout is disabled.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.