4

I came across a parsing problem, which would take a quite small regular expression to solve, except that the pattern to work, it should be recursive.
Example:

{([^{}]*(?:{(?1)})?)

What I want it to match, is a specific RTF header, but to do that, I need it to be recursive.

{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fnil\fcharset0 Tahoma;}}

Is there some kind of non-true RegEx-like engine implementation for .NET which would allow to find matches to these kinds of patterns (maybe even a different syntax)?

Update:

I do really appreciate everyone for informing me about the Balancing Group option in .NET implementation of Regular Expressions, especially Qtax, who has provided a very comprehensive link as a comment below, which helped me understand what's this all about, and also for posting an answer to my specific example. If you are reading this, and it also helped you, be sure to upvote that answer.
However... That did not answer the general question about recursion possibility in .NET Regex-like engine. This example, fortunately (I like challenges) is by far not the only one I've met. And other situations can't be solved using this solution, but only by being able to reference not a match, but to reuse a sequence of a pattern, to a point where recursion would be possible.

3
  • 2
    While not a direct answer to recursion, .NET's support for Balanced Trees might be useful for this problem. Commented Aug 2, 2011 at 16:54
  • 1
    Balancing groups in the manual at: msdn.microsoft.com/en-us/library/… Commented Aug 2, 2011 at 17:15
  • Yes, I did find that very helpful, thank you. However it's not the only case when I needed recursive regular expressions. Commented Aug 2, 2011 at 17:32

2 Answers 2

3

For your example using a balancing group would work.

You could use an expression like:

{
[^{}]*
(?:({)[^{}]*)*
(?'-1'})*
(?(1)(?!))
}

Example:

string re = @"{[^{}]*(?:({)[^{}]*)*(?'-1'})*(?(1)(?!))}";
string str = "foo {bar} baz {foo{bar{baz}}} {f{o{o}}{bar}baz} {foo{bar}baz}";

Console.WriteLine("Input: \"{0}\"", str);
foreach (Match m in Regex.Matches(str, re))
{
    Console.WriteLine("Match: \"{0}\"", m);
}

Output:

Input: "foo {bar} baz {foo{bar{baz}}} {f{o{o}}{bar}baz} {foo{bar}baz}"
Match: "{bar}"
Match: "{foo{bar{baz}}}"
Match: "{o{o}}"
Match: "{bar}"
Match: "{bar}"
Sign up to request clarification or add additional context in comments.

1 Comment

Not the complete answer to this particular question, but closest, and it DID help. Thank you.
3

Even Qtax exemple is very good and clear, it didn't match completely for me because return {o{o}} instead of {f{o{o}}{bar}baz} .

After looking for a time, my solution is (using almost the same example) :

Input :

string re = @"{(((?<Counter>{)*[^{}]*)*((?<-Counter>})*[^{}]*)*)*(?(Counter)(?!))}";
string str = "foo {bar} baz {foo{bar{{baz}a{a{b}}}}} {f{o{o}}{bar{a{b{c}}{d}}}baz} {foo{bar}baz}";

Console.WriteLine("Input: \"{0}\"", str);
foreach (Match m in Regex.Matches(str, re))
{
    Console.WriteLine("Match: \"{0}\"", m);
}

Output :

Input: "foo {bar} baz {foo{bar{{baz}a{a{b}}}}} {f{o{o}}{bar{a{b{c}}{d}}}baz} {foo{bar}baz}"
Match: "{bar}"
Match: "{foo{bar{{baz}a{a{b}}}}}"
Match: "{f{o{o}}{bar{a{b{c}}{d}}}baz}"
Match: "{foo{bar}baz}"

Some explanation, I increment a counter for each { and decrement the counter at each }. And lastly the regex match only if the counter is empty ((?(Counter)(?!))).

It seems work for deep recursion and also with alternate bracket.

See this site which also help me to create this regex.

I hope this will help.

PS : If you want to match also string with forgotten } at the end, use :

string re = @"{(((?<Counter>{)*[^{}]*)*((?<-Counter>(}|$))*[^{}]*)*)*(?(Counter)(?!))(}|$)";
string str = "foo {bar} baz {foo{bar{{baz}a{a{b}}}}} {f{o{o}}{bar{a{b{c}}{d}}}baz} {foo{bar}b{az";

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.