0

I have text like this:

This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text

I want to parse out data like that:

Name: name1
Value: value1

Name: name2
Value: {name3:even dipper {name4:valu4} dipper} some inner text

I would then recursively process each value to parse out nested fields. Can you recommend a RegEx expression to do this?

10
  • 1
    Have you tried anything? Why didn't your attempt work? Commented Mar 11, 2013 at 14:59
  • Regex cannot do that. Commented Mar 11, 2013 at 14:59
  • 2
    Regex can do that, just as you can do this with a beowulf-cluster of VIC-20s if you really, really wanted to. It would just be difficult, and not useful past the exercise. You need a stack-based parser, that you could code up faster than figuring out the regexes. Commented Mar 11, 2013 at 15:08
  • 2
    @SLaks, it's very possible. See my answer. Note that regex aren't regular. (This isn't formal language theory.) Commented Mar 11, 2013 at 15:51
  • 1
    @SLaks, well, "some extensions" would be PHP, Perl, C#, VB (not that few as your comment suggests). Also, your comment "fundamentally impossible" seems to suggest that you're talking about regular expressions in the theoretical sense, which is definitely not what is meant here [on SO]. Nearly all regex implementations can match much more than theoretical regular expressions, not including the ones that support recursive patterns. The pattern (.)\1 is supported by nearly every modern programming language, yet isn't "regular". Commented Mar 11, 2013 at 20:29

2 Answers 2

3

In C# you can use balancing groups to count and balance the brackets:

{ (?'name' \w+ ) :       # start of tag
(?'value'                # named capture
  (?>                    # don't backtrack
    (?:
      [^{}]+             # not brackets
    | (?'open' { )       # count opening bracket
    | (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
    )*
  )
  (?(open)(?!))          # make sure open is not > 0
)
}                        # end of tag

Example:

string re = @"(?x)       # enable eXtended mode (comments/spaces ignored)
{ (?'name' \w+ ) :       # start of tag
(?'value'                # named capture
  (?>                    # don't backtrack
    (?:
      [^{}]+             # not brackets
    | (?'open' { )       # count opening bracket
    | (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
    )*
  )
  (?(open)(?!))          # make sure open is not > 0
)
}                        # end of tag
";

string str = @"This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text";

foreach (Match m in Regex.Matches(str, re))
{
    Console.WriteLine("name: {0}, value: {1}", m.Groups["name"], m.Groups["value"]);
}

Output:

name: name1, value: value1
name: name2, value: {name3:even dipper {name4:valu4} dipper} some inner text
Sign up to request clarification or add additional context in comments.

Comments

2

If using Perl/PHP/PCRE it's not complicated at all. You can use an expression like:

{(\w+):         # start of tag
   ((?:
      [^{}]+    # not a tag
   |  (?R)      # a tag (recurse to match the whole regex)
   )*)
}               # end of tag

2 Comments

I am using C#, and I don't believe it supports (?R). It seems I will just need to programmatically solve this, regular expressions won't work.
@user1044169, in C# you have balancing groups which can be used to get the same result.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.