2

Let's start with little example; I have the following text:

[[ some tag [[ with tag nested ]] and again ]]

I'd like to match [[ with tag nested ]] but not [[ some tag [[ with tag nested ]] . Simple

\[\[(?<content>.+?)\]\]

obviously didn't work. So I created regexp:

\[\[(?!.*?\[\[.*?\]\].*?)(?<content>.+?)\]\]

Unfortunately it doesn't match anything using C# (with MatchOptions.SingleLine), while PHP's preg_match works perfectly.

Any clues/ideas? Any help would be much appreciated.

4
  • I got no problem running your regex in C# with SingleLine option. It returns [[ with tag nested ]] correctly. Can you post your code? Commented Jan 21, 2011 at 0:58
  • I'm not certain I see the problem. I created a System.Text.RegularExpressions.Regex using your second pattern and the RegexOptions.Singleline then called Match on your example string. It came back with one capture of "[[ with tag nested ]]". Commented Jan 21, 2011 at 1:00
  • @Harry: Try it with this input: [[ outer1 [[ nested1 ]] outer2 [[ nested2 ]] outer3 ]]. If I understand the question correctly, it should match nested1 and nested2, but it only matches nested2. Commented Jan 21, 2011 at 3:05
  • Sorry for confusion, but I simplified the example so the expected result would be easier to understand. Supprisingly, the example provided succeded with the regexp... but not the real subjects. Alan is right, I wanted to match all nested tags. Thank you all for the time spent on help. Commented Jan 21, 2011 at 17:34

2 Answers 2

3

The simplest way that I know of to find just one of the innermost brackets is this:

var match = Regex.Match(input, @"^.*(\[\[(.*?)\]\])", RegexOptions.Singleline);

This works because it finds the last [[ (so there are no more [[ after it, so it can’t contain any nested tags) and then the immediately following ]]. Of course, this assumes well-formedness; if you have a string where the start/end brackets don’t match up properly, this can fail.

Once you’ve found the innermost bracket, you could remove it from the input string:

input = input.Remove(match.Groups[1].Index, match.Groups[1].Length);

and then repeat the process in a while loop until the regular expression no longer matches.

Sign up to request clarification or add additional context in comments.

3 Comments

I'm afraid this does not produce what I (and you probably) wanted to: it matches from the first [[ . Anyway, thanks for the response.
@Avaer: No, it doesn’t. It works just fine. Have you tried it? If you think it fails, please provide an example input for which it fails.
I owe you an apologise, I did not observe the content of Groups[1], but in my rush just checked Value. It does work. Thanks again.
3

Would this be a valid match?

[[ with [ single ] brackets ]]

If not, this regex should do:

 \[\[(?<content>[^][]*)\]\]

[^][] matches any character that's not [ or ]. If single braces are allowed, try this:

\[\[(?<content>(?:(?!\[\[|\]\]).)*)\]\]

(?!\[\[|\]\]). matches any character, but only after making sure it's not the start of a [[ or ]] sequence.

8 Comments

@Avaer: So does mine, and mine is simpler.
@Timwi, I prefer Alan's suggestion. Perhaps yours is simpler in the sense that the regex is shorter, but figuring out why it works (because the first .* consumes the entire line, and then back-tracks to the last [[) is not that intuitive. Besides, your proposition does not handle cases like aaa [[ bbb ccc ]] [[ ddd.
@Bert: The example you gave is handled by my regex just fine. Have you tried it? — Also, I’m amused how you think this one here is easier to figure out; it’s hideous in comparison :)
@Timwi: Try it with the example I used in another comment: [[outer1[[nested1]]outer2[[nested2]]outer3]]. Using the Matches() method, your regex captures only nested2, while mine captures nested1 and nested2 as desired. I agree it's hideous, but it's as simple as it can be and still meet the requirements.
@Alan: As I mentioned in my answer, it finds just one of the innermost brackets, and if you want to match all of them, you can always use a while loop. I can’t think of a reasonable usecase in which you want only innermost brackets and yet all of them, so in 99% of the cases your Matches approach still requires a while loop.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.