2

I have a C# Regex class matching multiple subgroups such as

(?<g1>abc)|(?<g2>def)|(?<g3>ghi) 

but with much more complicated sub-patterns. I basically want to match anything that doesn't belong to any of those groups, in addition to existing groups.

I tried

(?<g1>abc)|(?<g2>def)|(?<g3>ghi)|(.+?) 

but it turned out too slow. I can't do negation because I don't want to copy those complex subpatterns redundantly. Using just (.+) overrides all other groups as expected.

Is there any other way? If that doesn't work I'll have to write an ad-hoc parser.

Additional details: All these groups are evaluated against a MatchEvaluator. So a Regex class behavior that sends "unmatched strings" to the MatchEvaluator will also work.

A sample text would be

.......abc........ghi.....def.....abc....def...ghi......abc.......

I want to catch parts inbetween.

4 Answers 4

2

Your regex generates separate match for every single character outside g1,g2,g3. So when you use it with MatchEvaluator it generates lots of evaluator calls. Thats why its slow.

If you try following regex:

(?<rest>.*?)((?<g1>abc)|(?<g2>def)|(?<g3>ghi)|$)

you will get single "rest" group match for entire fragment of text that doesnt contain "g" group.

Regex C# code:

Regex regex = new Regex(
    @"(?<rest>.*?)((?<g1>abc)|(?<g2>def)|(?<g3>ghi)|$)",
    RegexOptions.Singleline
    | RegexOptions.Compiled
    );
Sign up to request clarification or add additional context in comments.

Comments

2

but it turned out too slow. I can't do negation because I don't want to copy those complex subpatterns redundantly.

Why not something like:

const string COMPLEX_REGEX_PATTERN = "\Gobbel[dy]go0\k"

1 Comment

Not a bad idea at all. I'll think about this if I don't receive any better answer.
1

Have you tried setting the regex option to be compiled? I find using a static compiled regex can speed things up considerably.

4 Comments

It's still slow even when compiled. Around 6 times slower than the one without the last group.
Compiling the regex helps, but if you're only using it once, then you don't gain anything from compiling it. Only gains are through reuse of the same regex.
Exactly. I wasn't sure or not how many times he would be matching. If I know I'm going to use it multiple times, and the pattern is static. I usually use static readonly Regex rx = new Regex("somepatter", Compiled);
I'm using it multiple times and I'm compiling it only once. But as I said, 6x speed difference does not change.
0

If your regex is four pages long, writing a state machine yourself would probably be a better idea...

12 Comments

He said "Such as" ... "but with much more complicated sub-patterns"
If I had pasted the actual Regex, this question would be four pages long :)
If your regular expression is 4 pages long, you shouldn't be using a regular expression
Because of performance? I'm very happy with its performance without the last case.
If your Regex is 4 pages long, something is horribly wrong on more than one level.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.