1

I am writing a regular expression in Visual Studio 2013 using C#

I have the following scenario:

Match match = Regex.Match("%%Text%%More text%%More more text", "(?<!^)%%[^%]+%%");

But my problem is that I don't want to capture groups. The reason is that with capture groups match.Value contains %%More text%% and my idea is the get on match.Value directly the string: More text

The string to get will be always between the second and the third group of %% Another approach is that the string will be always between the fourth and fifth %

I tried:

Regex.Match("%%Text%%More text%%More more text", "(?:(?<!^)%%[^%]+%%)");

But with no luck.

I want to use match.Value because all my regex are in a database table.

Is there a way to "transform" that regex to one not using capturing groups and the in match.value the desired string?

1
  • here is a good place to start.. have you read any of the examples or consulted the documentation on Regex.Match Commented Dec 3, 2015 at 16:03

1 Answer 1

2

If you are sure you have no %s inside double %%s, you can just use lookarounds like this:

(?<=^%%[^%]*%%)[^%]+(?=%%)
^^^^^^^^^^^^^^      ^^^^^

If you have single-% delimited strings (like %text1%text2%text3%text4%text5%text6, see demo):

(?<=^%[^%]*%)[^%]+(?=%)

See regex demo

And in case it is between the 4th and the 5th:

(?<=^%%(?:[^%]*%%){3})[^%]+(?=%%)
^^^^^^^^^^^^^^^^^^^^^^     ^^^^^^

For single-% delimited strings (see demo):

(?<=^%(?:[^%]*%){3})[^%]+(?=%)

See another demo

Both the regexps contain a variable-width lookbehind and the same lookahead to restrict the context the 1 or more characters other than % appears in.

The (?<=^%%[^%]*%%) makes sure the is %%[something_other_then_%]%% right after the beginning of the string, and (?<=^%%(?:[^%]*%%){3}) matches %%[substring_not_having_%]%%[substring_not_having_%]%%[substring_not_having_%]%% after the string start.

In case there can be single % symbols inside the double %%, you can use an unroll-the-loop regex (see demo):

(?<=^%%(?:[^%]*(?:%(?!%)[^%]*)*%%){3})[^%]*(?:%(?!%)[^%]*)*(?=%%)

Which is matching the same stuff that can be matched with (?<=^%%(?:.*?%%){3}).*?(?=%%). For short strings, the .*? based solution should work faster. For very long input texts, use the unrolled version.

Sign up to request clarification or add additional context in comments.

6 Comments

yes I'm sure is not going to happend that. I tested it and worked fine. Thanks!
Good, I will still post a solution that accounts for cases when there can be a single % inside the double %%.
The second solution: (?<=^%%(?:[^%]*%%){3})[^%]+(?=%%) didn't worked but worked if i change it for: (?<=^%(?:[^%]*%){3})[^%]+(?=%) maeby is a typo.
Well, I see you have double %% in the original question, so I used two % in the pattern. If you have one, there is really no problem then with using [^%]+.
Sorry seems that i didn't explained well in my previous comment. The approach of using 4th and 5th place doesn't match anything in the string: %%Text%%More text%%More more text but if i use (?<=^%(?:[^%]*%){3})[^%]+(?=%) yes because to be the the between the 4th and 5th need to use single % so will work for example with strings like: %text1%text2%text3%text4%text5%text6 That will get text4
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.