20

I want to match strings like:

The sentence is 'He said "Hello there"'
The sentence is "He said 'Hello there'"

and get back a single capture (match) that is the sentence inside the outer single or double quotes.

^The sentence is (?:(?:'([^']*)')|(?:"([^"]*)"))$

The above regex gives me back 2 captured groups, one of them empty and the other containing the desired sentence.

^The sentence is (['"])(.*)\1$

Returns the quotation mark (single or double quote) as the 1st group and the sentence as the 2nd group.

If I make the first group non-capturing,

^The sentence is (?:['"])(.*)\1$

then I cannot use the later reference to the captured group. (the \1 is, of course, no longer referring to the single or double quote match)

Is there a way to have groups whose "capture" can be referenced later in the regex, but whose captured value is not returned in the list of matches?

Or some other way to solve my (seemingly simple) problem.

1

4 Answers 4

19

Very sad, but such an elegant and accurate way does not work:

(["'])(?:\\\1|[^\1]+)*\1

But we can change it a little bit, and all works fine:

(["'])((?:\\\1|(?:(?!\1)).)*)(\1)

https://regex101.com/r/dKdBMT/2

I would like to make sure that this regexp will work in all cases: please more test it.

Sign up to request clarification or add additional context in comments.

Comments

16

This one seems to work:

(?:'|").*(?:'|")

or

((?:'|").*(?:'|"))

if you need a group.

Here's the demo: link

It works, because * is a greedy quantifier, so you don't have to know what kind of quote is in the end. * will take as much as possible.

6 Comments

The first example does not actually capture anything. The second example captures the sentence including the outer single or double quotes. A combination works: (?:'|")(.*)(?:'|")
I thought, you want to capture quotation signs as well, didn't go back to question to check it. I'm glad it helped.
Not knowing whether the first quote is single or double means that unbalanced sentences get matched: The sentence is "abc 123' and 'He said "Goodbye" and the last of those returns He said "Goodbye as the sentence. It would be nice to not match such strings with unbalanced quotes.
Check this regex (?:'|")(.*(?:'|").*(?:'|"))(?:'|") and demo. It still can not be perfect as I don't know what kind of cases you have, but it should help you.
No. This will match the regex: 'ss"
|
6

You want to make sure the quote symbols are properly matched, so a quote starting with a single quote ends with a single quote. Also, the regex should allow for escaping a quote symbol with a backslash if it's the same symbol (double or single quote symbol) bounding the string. Try this:

"(?:[^"\\]|\\.)*"|'(?:[^'\\]|\\.)*'

These samples match this regex:

'sing"le q\'uote'

"dou\"ble 'quote"

Comments

1

One of above is very accurate. But, needs some updates. Here it is:

(["'])((?:\\1|(?:(?!\1)).)*)(\1)

This will match everything as string literals.

1 Comment

Now, I want to match { OR } braces except matching pattern (["'])((?:\\1|(?:(?!\1)).)*)(\1) in same string. I'm trying but no luck..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.