5

I had a good experience at the speed of regex in JS.

And I decided to make a small comparison. I ran the following code:

var str = "A regular expression is a pattern that the regular expression engine attempts to match in input text.";

var re = new RegExp("t", "g");

console.time();

for(var i = 0; i < 10e6; i++)
   str.replace(re, "1");

console.timeEnd();

The result: 3888.731ms.

Now in C#:

var stopwatch = new Stopwatch();

var str = "A regular expression is a pattern that the regular expression engine attempts to match in input text.";

var re = new Regex("t", RegexOptions.Compiled);

stopwatch.Start();

for (int i = 0; i < 10e6; i++)
    re.Replace(str, "1");

stopwatch.Stop();

Console.WriteLine( stopwatch.Elapsed.TotalMilliseconds);

Result: 32798.8756ms !!

Now, I tried re.exec(str); vs Regex.Match(str, "t");: 1205.791ms VS 7352.532ms in favor of JS.

Is massive text processing "Not suitable" subject to be done in .net?

UPDATE 1 same test with [ta] pattern (instead t literal):

3336.063ms in js VS 64534.4766!!! in c#.

another example:

console.time();

var str = "A regular expression is a pattern that the regular expression engine attempts 123 to match in input text.";


var re = new RegExp("\\d+", "g");
var result;
for(var i = 0; i < 10e6; i++)
    result = str.replace(str, "$0");
   

console.timeEnd();

3350.230ms in js, vs 32582.405ms in c#.

14
  • Have you tried a precompiled regex in c#? Commented Dec 16, 2017 at 18:26
  • 1
    I was able to reproduce the c# performance, Release/Any CPU (64 bit)/Not Running in Visual Studio. My time using RegexOptions.None: 46509.2514 ms. My time using RegexOptions.Compiled: 36174.9981 ms. Commented Dec 16, 2017 at 18:33
  • Assign str.replace(re, "1"); to something to ensure JS is not considering it a no-op and optimizing it away Commented Dec 16, 2017 at 18:35
  • @AlexK. result = str.replace(str, "1"); = 3026.953ms Commented Dec 16, 2017 at 18:37
  • 2
    Ok, however, I don't see why the update 2 test is "more useful". As an aside, when you write \d+ in a double quoted string for the RegExp constructor, it is interpreted as d+ (the non-sense escape is simply ignored and the next character is seen as a literal). To figure the \d character class inside a double quoted string you have to use two backslashes: var re=RegExp("\\d+", "g");. Note that writing var re=/\d+/g; or var re=RegExp(/\d+/g); is exactly the same (none of these versions are compiled earlier or later.) Commented Dec 16, 2017 at 20:40

2 Answers 2

3

String in C# is a dangerous beast and you really can shoot yourself in the foot if you use it carelessly, but I don't think given test is representative enough to warrant any generalizations.

First, I did reproduce similar performance for your test case. Adding RegexOptions.Compiled reduced the required time to 30-ish seconds, but this is still significant difference.

The specific test case is probably not a too realistic one, as who would use regex for single char replace? Should you use a dedicated API for this task, you would get comparable results str.Replace('t', '1'); was 1600ms on my machine.

This means for this specific task C# performance is comparable to JS. Whether the C# Regex.Replace() is internally somehow not suitable for single-char replaces or if JS regex version is optimizing the regex away - some JS guru should answer that.

Would a more realistic complex regex have a notable difference - would be interesting to know.

Edit: I verified that the performance gap remains when the replace results are actually used and when input strings differ in each run (10s vs 35s in my tests). So gap is less, but still there.

Possible reasons

According to hints from this SO question browser implementations delegate some string operations to optimized c++ code. If they do this for string concat, they probably do that for Regex as well. AFAIK, C# Regex ans String classes stay in managed world and that brings some baggage.

Sign up to request clarification or add additional context in comments.

1 Comment

Add a number to the string, and change the expression to \d+. I think this is a useful classic case. The results are similar (4 vs 31 seconds).
1

One of the reasons for the big difference between JS regex and .NET regex is that JS lacks quite a number of advanced features, however .NET is very feature-rich.

Here's two quotes from regular-expressions.info:

JavaScript:

JavaScript implements Perl-style regular expressions. However, it lacks quite a number of advanced features available in Perl and other modern regular expression flavors:

No \A or \Z anchors to match the start or end of the string. Use a caret or dollar instead.

No atomic grouping or possessive quantifiers.

No Unicode support, except for matching single characters with \uFFFF.

No named capturing groups. Use numbered capturing groups instead.

No mode modifiers to set matching options within the regular expression.

No conditionals.

No regular expression comments. Describe your regular expression with JavaScript // comments instead, outside the regular expression string.

.NET Framework:

The Microsoft .NET Framework, which you can use with any .NET programming language such as C# (C sharp) or Visual Basic.NET, has solid support for regular expressions. .NET's regex flavor is very feature-rich. The only noteworthy feature that's lacking are possessive quantifiers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.