5

The example php regex (below) uses subroutine calls to work.

If I try use it with the C# Regex class I get an error: Unrecognized grouping construct

Is it possible to rewrite this in to C# regex syntax?

Would it be a simple translation, or does another (regex) approach need to be used?

If it is not possible what is the name of the thing it is using, so I can add it to this question to make it more useful to others with the same problem?

PHP which works with all json RFC test data

$pcre_regex = '
  /
  (?(DEFINE)
     (?<number>   -? (?: [1-9]\d*| 0 ) (\.\d+)? (e [+-]? \d+)? )    
     (?<boolean>   true | false | null )
     (?<string>    " (?>[^"\\\\]+ | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
     (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
     (?<pair>      \s* (?&string) \s* : (?&json)  )
     (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
     (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
  )
  \A (?&json) \z
  /six   
';

And not working in C#

string pattern = @"(?(DEFINE)
 (?<number>   -? (?: [1-9]\d* | 0 ) (\.\d+)? (e [+-]? \d+)? )    
 (?<boolean>   true | false | null )
 (?<string>    "" (?>[^""\\\\]+ | \\\\ [""\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* "" )
 (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
 (?<pair>      \s* (?&string) \s* : (?&json)  )
 (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
 (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* ))
\A (?&json) \z
";
    string input = @"[{\"Example\": \"data\"}]";
    RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline;

    bool isValid = Regex.IsMatch(input, pattern, options);

Edit: This question is NOT about using regex with json, it is about how to do something (subroutine calls) in C#, which CAN be done in PHP regex

Just because there is a way of parsing json in C# DOES NOT answer the question. Please keep your answers and comments on topic.

11
  • You should be using regex with html. html is not regular and regex is for regular text. Use an html class and method in the class. Commented Nov 12, 2017 at 9:22
  • When you simplify the regex to find the construct that provokes the error message, what did you find? Please read about minimal reproducible example and the other help center pages. Commented Nov 12, 2017 at 11:22
  • 1
    FWIW json is regular enough to use with (some) modern regex engines See: stackoverflow.com/a/3845829/309634 Commented Nov 12, 2017 at 22:54
  • 1
    It's not possible with a single regex since recursion isn't possible. Even using balancing groups doesn't provide all the functionality that recursion does. I was able to create a regex that does 99% of this, but what it cannot do is match nested objects inside an array since it cannot recurse the parent group (object) in the child group (array) Commented Nov 24, 2017 at 20:25
  • 1
    @DarcyThomas: Ok, about the "number" subpattern, testing with a lookahead is stupid since you can directly match the beginning of the number. Also, since the whole pattern is case insensitive, no need to write: [eE]. About the "string" subpattern, a branch that can match an empty in a group that isn't atomic (or repeated with a possessive quantifier) in an alternation is clearly the way to go if you want to obtain a catastrophic backtracking (for example with a string without a closing quote). To finish \Z is for the end of a line, \z is for the end of the string. Commented Nov 28, 2017 at 21:39

2 Answers 2

4

This does not directly answer the question but is a work around.

Rather than using the BCL Regex class, there is a project called PCRE.NET, which wraps the PCRE regex engine (the same engine which is used in the PHP example) with C# function calls.

This would allow the use of regex with subroutine calls in C# land.

Sign up to request clarification or add additional context in comments.

2 Comments

I'm glad you found my lib useful :) To answer the original question: no, there is no general-purpose way to convert a recursive PCRE pattern into a .NET regex. Those two regex engines are fundamentally different in several ways, and each one supports some features the other one doesn't. This is what motivated me to write the library in the first place. You can sometimes work around the lack of recursion in .NET regexes with balancing groups, but as soon as you have different kind of groups you're most probably out of luck, or you'll have to write a monstrous pattern.
See here and here for some really good info (by Kobi) relevant to your question.
2
+100

The short answer is kinda, but not really.

.Net regex has a concept called balancing groups.

This is really good for checking if all of your opening braces have matching (i.e., nested is Ok, but overlapping is not)

For example this regex will ensure that all of the curly braces match:

{(?:[^{}]|(?<Open>{)|(?<Content-Open>}))+(?(Open)(?!))}

Which matches this string:

{1 2 {3} {4 5 {6}} 7}

However it is beyond me to craft a regex which includes several nested groupings; like in the example.

Further more it looks like you would need to make a nested regex pattern with as many nestlings you would expect in your source data.

What you could try is combining balanced groups with some recursive C# to par down each grouping. There is something similar in this answer (But I would not recommend it in this case)

Alternatively you could add this nuget package. Which is a wrapper around the PCRE regex engine, which supports recursive subroutines. Details here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.