2

I have strings of this type:

text (more text)

What I would like to do is to have a regular expression that extracts the "more text" segment of the string. So far I have been using this regular expression:

"^.*\\((.*)\\)$"

Which although it works on many cases, it seems to fail if I have something of the sort:

text (more text (even more text))

What I get is: even more text)

What I would like to get instead is: more text (even more text) (basically the content of the outermost pair of brackets.)

7 Answers 7

7

Besides lazy quantification, another way is:

"^[^(]*\\((.*)\\)$"

In both regexes, there is a explicitly specified left parenthesis ("\\(", with Java String escaping) immediately before the matching group. In the original, there was a .* before that, allowing anything (including other left parentheses). In mine, left parentheses are not allowed here (there is a negated character class), so the explicitly specified left parenthesis in the outermost.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks worked like a charm. Will mark as answer when the 10 minutes are over.
It helps to explain the answer!
4

I recommend this (double escaping of the backslash removed, since this is not part of the regex):

^[^(]*\((.*)\)

Matching with your version (^.*\((.*)\)$) occurs like this:

  1. The star matches greedily, so your first .* goes right to the end of the string.
  2. Then it backtracks just as much as necessary so the \( can match - that would be the last opening paren in the string.
  3. Then the next .* goes right to the end of the string again.
  4. Then it backtracks just as much so the \) can match, i.e. to the last closing paren.

When you use [^(]* instead of .*, it can't go past the first opening paren, so the first opening paren (the correct one) in the string will delimit your sub-match.

1 Comment

+1 I like the explanation of how it goes about finding the match
4

Try:

"^.*?\\((.*)\\)$"

That should make the first matching less greedy. Greedy means it swallows everything it possibly can while still getting an overall pattern match.

The other suggestion:

"^[^(]*\\((.*)\\)$"

Might be more along the line of what you're looking for though. For this simple example it doesn't really matter so much, but it could if you wanted to expand on the regex, for example by making the part inside the braces optional.

1 Comment

@Tomalak: Right! Also +1 for you!
1

Try this:

"^.*?\\((.*)\\)$"

Comments

1

True regular expressions can't count parentheses; this requires a pushdown automaton. Some regex libraries have extensions to support this, but I don't think Java's does (could be wrong; Java isn't my forté).

BTW, the other answers I've seen so far will work with the example given, but will break with, e.g., text (more text (even more text)) (another bit of text). Changing greediness doesn't make up for the inability to count.

Comments

0
$str =~ /^.*?\((.*)\)/

Comments

-1

I think the reason is because you second wildcard is picking up the closing parenthesis. You'll need to exclude it.

1 Comment

This is wrong. He wants to include closing parentheses in the matching group, in order to match things like "more text (even more text)"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.