Don't use capturing groups in c# Regex

Question

I am writing a regular expression in Visual Studio 2013 using C#

I have the following scenario:

Match match = Regex.Match("%%Text%%More text%%More more text", "(?<!^)%%[^%]+%%");

But my problem is that I don't want to capture groups. The reason is that with capture groups match.Value contains %%More text%% and my idea is the get on match.Value directly the string: More text

The string to get will be always between the second and the third group of %% Another approach is that the string will be always between the fourth and fifth %

I tried:

Regex.Match("%%Text%%More text%%More more text", "(?:(?<!^)%%[^%]+%%)");

But with no luck.

I want to use match.Value because all my regex are in a database table.

Is there a way to "transform" that regex to one not using capturing groups and the in match.value the desired string?

here is a good place to start.. have you read any of the examples or consulted the documentation on Regex.Match — MethodMan
– MethodMan, Commented Dec 3, 2015 at 16:03

Wiktor Stribiżew · Accepted Answer · 2015-12-03 16:17:57Z

2

If you are sure you have no %s inside double %%s, you can just use lookarounds like this:

(?<=^%%[^%]*%%)[^%]+(?=%%)
^^^^^^^^^^^^^^      ^^^^^

If you have single-% delimited strings (like %text1%text2%text3%text4%text5%text6, see demo):

(?<=^%[^%]*%)[^%]+(?=%)

See regex demo

And in case it is between the 4th and the 5th:

(?<=^%%(?:[^%]*%%){3})[^%]+(?=%%)
^^^^^^^^^^^^^^^^^^^^^^     ^^^^^^

For single-% delimited strings (see demo):

(?<=^%(?:[^%]*%){3})[^%]+(?=%)

See another demo

Both the regexps contain a variable-width lookbehind and the same lookahead to restrict the context the 1 or more characters other than % appears in.

The (?<=^%%[^%]*%%) makes sure the is %%[something_other_then_%]%% right after the beginning of the string, and (?<=^%%(?:[^%]*%%){3}) matches %%[substring_not_having_%]%%[substring_not_having_%]%%[substring_not_having_%]%% after the string start.

In case there can be single % symbols inside the double %%, you can use an unroll-the-loop regex (see demo):

(?<=^%%(?:[^%]*(?:%(?!%)[^%]*)*%%){3})[^%]*(?:%(?!%)[^%]*)*(?=%%)

Which is matching the same stuff that can be matched with (?<=^%%(?:.*?%%){3}).*?(?=%%). For short strings, the .*? based solution should work faster. For very long input texts, use the unrolled version.

edited Dec 3, 2015 at 16:17

answered Dec 3, 2015 at 16:00

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Miguel Over a year ago

yes I'm sure is not going to happend that. I tested it and worked fine. Thanks!

Wiktor Stribiżew Over a year ago

Good, I will still post a solution that accounts for cases when there can be a single % inside the double %%.

Miguel Over a year ago

The second solution: (?<=^%%(?:[^%]*%%){3})[^%]+(?=%%) didn't worked but worked if i change it for: (?<=^%(?:[^%]*%){3})[^%]+(?=%) maeby is a typo.

Wiktor Stribiżew Over a year ago

Well, I see you have double %% in the original question, so I used two % in the pattern. If you have one, there is really no problem then with using [^%]+.

Miguel Over a year ago

Sorry seems that i didn't explained well in my previous comment. The approach of using 4th and 5th place doesn't match anything in the string: %%Text%%More text%%More more text but if i use (?<=^%(?:[^%]*%){3})[^%]+(?=%) yes because to be the the between the 4th and 5th need to use single % so will work for example with strings like: %text1%text2%text3%text4%text5%text6 That will get text4

|

Collectives™ on Stack Overflow

Don't use capturing groups in c# Regex

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related