Regex to match expression with multiple parentheses, one within each other

Question

I'm building a task (in PHP) that reads all the files of my project in search for i18n messages. I want to detect messages like these:

// Basic example
__('Show in English')  => Show in English
// Get the message and the name of the i18n file 
__("Show in English", array(), 'page') => Show in English, page
// Be careful of quotes
__("View Mary's Car", array()) => View Mary's Car
// Be careful of strings after the __() expression
__('at').' '.function($param) => at

The regex expression that works for those cases (there are some other cases taken into account) is:

__\(.*?['|\"](.*?)(?:['|\"][\.|,|\)])(?: *?array\(.*?\),.*?['|\"](.*?)['|\"]\)[^\)])?

However if the expression is in multiple lines it doesn't work. I have to include dotail /s, but it breaks the previous regex expresion as it doesn't control well when to stop looking ahead:

// Detect with multiple lines
echo __('title_in_place', array(
    '%title%' => $place['title']
  ), 'welcome-user'); ?>

There is one thing that will solve the problem and simplify the regex expression that it's matching open-close parentheses. So no matter what's inside __() or how many parentheses there are, it "counts" the number of openings and expects that number of closings.

Is it possible? How? Thanks a lot!

ridgerunner · Accepted Answer · 2012-06-26 05:18:48Z

1

Yes. First, here is the classic example for simple nested brackets (parentheses):

`$([^()]|(?R))*$`

or faster versions which use a possesive quantifier:

$([^()]++|(?R))*$

or (equivalent) atomic grouping:

$(?>[^()]+|(?R))*$

But you can't use the: (?R) "match whole expression" expression here because the outermost brackets are special (with two leading underscores). Here is a tested script which matches (what I think) you want...

Solution: Use group `$1` (recursive) subroutine call: `(?1)`

<?php // test.php Rev:20120625_2200
$re_message = '/
    # match __(...(...)...) message lines (having arbitrary nesting depth).
    __\(                     # Outermost opening bracket (with leading __().
    (                        # Group $1: Bracket contents (subroutine).
      (?:                    # Group of bracket contents alternatives.
        [^()"\']++           # Either one or more non-brackets, non-quotes,
      | "[^"\\\\]*(?:\\\\[\S\s][^"\\\\]*)*"      # or a double quoted string,
      | \'[^\'\\\\]*(?:\\\\[\S\s][^\'\\\\]*)*\'  # or a single quoted string,
      | \( (?1) \)          # or a nested bracket (repeat group 1 here!).
      )*                    # Zero or more bracket contents alternatives.
    )                       # End $1: recursed subroutine.
    \)                      # Outermost closing bracket.
    .*                      # Match remainder of line following __()
    /mx';
$data = file_get_contents('testdata.txt');
$count = preg_match_all($re_message, $data, $matches);
printf("There were %d __(...) messages found.\n", $count);
for ($i = 0; $i < $count; ++$i) {
    printf("  message[%d]: %s\n", $i + 1, $matches[0][$i]);
}
?>

Note that this solution handles balanced parentheses (inside the "__(...)" construct) to any arbitrary depth (limited only by host memory). It also correctly handles quoted strings inside the "__(...)" and ignores any parentheses that may appear inside these quoted strings. Good luck. *

edited Jun 26, 2012 at 5:18

answered Jun 26, 2012 at 0:38

ridgerunner

34.6k6 gold badges60 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

fesja Over a year ago

really great answer. Thanks a lot! One last question, how do I capture the two strings I was capturing with the repeat groups. __("Show in English", array(), 'page') => [1]Show in English, [2]page

ridgerunner Over a year ago

@fesja - You're welcome. Your follow up question represents an entirely new question. When dealing with regex, you need to be very specific with your questions and provide example input (both matching and non-matching) as well as the desired output.

fesja Over a year ago

thanks, although in fact I added that example input and output on the question (see second example). Any reference you could give would be great!

Mark Byers · Accepted Answer · 2012-06-25 16:28:57Z

1

Matching balanced parentheses is not possible with regular expressions (unless you use an engine with non-standard non-regular extensions, but even then it's still a bad idea and will be hard to maintain).

You could use a regular expression to find lines containing potential matches, then iterate over the string character by character counting the number of open and close parentheses until you find the index of the matching closing parenthesis.

answered Jun 25, 2012 at 16:28

Mark Byers

844k202 gold badges1.6k silver badges1.5k bronze badges

Comments

Steve Wortham · Accepted Answer · 2012-06-25 16:52:15Z

0

The only way I'm aware of pulling this off is with balanced group definitions. That's a feature in the .NET flavor of regular expressions, and is explained very well in this article.

And as Qtax noted, this can be done in PCRE with (?R) as decribed in their documentation.

Or this could also be accomplished by writing a custom parser. Basically the idea would be to maintain a variable called ParenthesesCount as you're parsing from left to right. You'd increment ParenthesesCount every time you see ( and decrement for every ). I've written a parser recently that handles nested parentheses this way.

edited Jun 25, 2012 at 16:52

answered Jun 25, 2012 at 16:34

Steve Wortham

22.3k5 gold badges72 silver badges91 bronze badges

2 Comments

Qtax Over a year ago

It's even easier in PCRE than .NET regex, see (?R).

Steve Wortham Over a year ago

@Qtax - Interesting. The regular expression can become pretty hard to understand once you dive into some of these more advanced features. But it's neat the way PCRE solves the problem. For the OP, this page describes the (?R) recursive pattern... pcre.org/pcre.txt

Mirocow · Accepted Answer · 2017-01-12 13:43:24Z

0

for me use such expression

(\(([^()]+)\))

i try find it

 * 1) (1+2)
 * 2) (1+2)+(3+2)
 * 3) (IF 1 THEN 1 ELSE 0) > (IF 2 THEN 1 ELSE 1)
 * 4) (1+2) -(4+ (3+2))
 * 5) (1+2) -((4+ (3+2)-(6-7)))

answered Jan 12, 2017 at 13:43

Mirocow

3353 silver badges5 bronze badges

Collectives™ on Stack Overflow

Regex to match expression with multiple parentheses, one within each other

4 Answers 4

`\(([^()]|(?R))*\)`

Solution: Use group `$1` (recursive) subroutine call: `(?1)`

3 Comments

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

\(([^()]|(?R))*\)

Solution: Use group $1 (recursive) subroutine call: (?1)

3 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

`\(([^()]|(?R))*\)`

Solution: Use group `$1` (recursive) subroutine call: `(?1)`