RegEx to parse nested tags?

Question

I have text like this:

This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text

I want to parse out data like that:

Name: name1
Value: value1

Name: name2
Value: {name3:even dipper {name4:valu4} dipper} some inner text

I would then recursively process each value to parse out nested fields. Can you recommend a RegEx expression to do this?

Regex can do that, just as you can do this with a beowulf-cluster of VIC-20s if you really, really wanted to. It would just be difficult, and not useful past the exercise. You need a stack-based parser, that you could code up faster than figuring out the regexes. — Michael Paulukonis
– Michael Paulukonis, Commented Mar 11, 2013 at 15:08
@SLaks, it's very possible. See my answer. Note that regex aren't regular. (This isn't formal language theory.) — Qtax
– Qtax, Commented Mar 11, 2013 at 15:51
@SLaks, well, "some extensions" would be PHP, Perl, C#, VB (not that few as your comment suggests). Also, your comment "fundamentally impossible" seems to suggest that you're talking about regular expressions in the theoretical sense, which is definitely not what is meant here [on SO]. Nearly all regex implementations can match much more than theoretical regular expressions, not including the ones that support recursive patterns. The pattern (.)\1 is supported by nearly every modern programming language, yet isn't "regular". — Bart Kiers
– Bart Kiers, Commented Mar 11, 2013 at 20:29

Qtax · Accepted Answer · 2013-03-11 19:52:43Z

In C# you can use balancing groups to count and balance the brackets:

{ (?'name' \w+ ) :       # start of tag
(?'value'                # named capture
  (?>                    # don't backtrack
    (?:
      [^{}]+             # not brackets
    | (?'open' { )       # count opening bracket
    | (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
    )*
  )
  (?(open)(?!))          # make sure open is not > 0
)
}                        # end of tag

Example:

string re = @"(?x)       # enable eXtended mode (comments/spaces ignored)
{ (?'name' \w+ ) :       # start of tag
(?'value'                # named capture
  (?>                    # don't backtrack
    (?:
      [^{}]+             # not brackets
    | (?'open' { )       # count opening bracket
    | (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
    )*
  )
  (?(open)(?!))          # make sure open is not > 0
)
}                        # end of tag
";

string str = @"This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text";

foreach (Match m in Regex.Matches(str, re))
{
    Console.WriteLine("name: {0}, value: {1}", m.Groups["name"], m.Groups["value"]);
}

Output:

name: name1, value: value1
name: name2, value: {name3:even dipper {name4:valu4} dipper} some inner text

Qtax · Accepted Answer · 2013-03-11 15:47:28Z

2

If using Perl/PHP/PCRE it's not complicated at all. You can use an expression like:

{(\w+):         # start of tag
   ((?:
      [^{}]+    # not a tag
   |  (?R)      # a tag (recurse to match the whole regex)
   )*)
}               # end of tag

answered Mar 11, 2013 at 15:47

Qtax

34k9 gold badges92 silver badges127 bronze badges

2 Comments

user1044169 Over a year ago

I am using C#, and I don't believe it supports (?R). It seems I will just need to programmatically solve this, regular expressions won't work.

Qtax Over a year ago

@user1044169, in C# you have balancing groups which can be used to get the same result.

Collectives™ on Stack Overflow

RegEx to parse nested tags?

2 Answers 2

Example:

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Example:

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related