6

I have a string in the form of

Foo
"Foo"
"Some Foo"
"Some Foo and more"

I need to extract the value Foo which is in quotes and can be surrounded by any number of alphanumeric and white space characters. So, for the examples above I would like the output to be

<NoMatch>
Foo
Foo
Foo

I have been trying to get this to work, and this is the pattern I have so far using lookahead/lookbehind for quotes. This works for "Foo" but not others.

(?<=")Foo(?=")

Further expanding this to

(?<=")(?<=.*?)Foo(?=.*?)(?=")

does not work.

Any assistance will be appreciated!

6
  • 1
    You said "surrounded by alphanumeric characters". Quotes and whitespace aren't alphanumeric. Commented May 24, 2013 at 10:53
  • What language do you use? Commented May 24, 2013 at 10:55
  • I am using this as part of a search and replace, in notepad++ Commented May 24, 2013 at 10:55
  • @Barmar Thanks, I have reworded the question Commented May 24, 2013 at 10:55
  • Can you be sure that a) quotes are always correctly balanced? b) there aren't any escaped quotes? c) quoted strings never span multiple lines? Commented May 24, 2013 at 10:58

4 Answers 4

12

If quotes are correctly balanced and quoted strings don't span multiple lines, then you can simply look ahead in the string to check whether an even number of quotes follows. If that's not true, we know that we're inside a quoted string:

Foo(?![^"\r\n]*(?:"[^"\r\n]*"[^"\r\n]*)*$)

Explanation:

Foo          # Match Foo
(?!          # only if the following can't be matched here:
 [^"\r\n]*   # Any number of characters except quotes or newlines
 (?:         # followed by
  "[^"\r\n]* # (a quote and any number of non-quotes/newlines
  "[^"\r\n]* # twice)
 )*          # any number of times.
 $           # End of the line
)            # End of lookahead assertion

See it live on regex101.com

Sign up to request clarification or add additional context in comments.

2 Comments

Is there a way to exclude the line if there are no quotes? Currently this pattern matches Foo as well as "Foo".
@Kami: No, it shouldn't do that. See the test link. It may be that you need to prepend (?m) to the regex to make sure that $ matches at the end of a line, not just at the end of the file. But usually that is the default behaviour in text editors.
3

Look-around ((?<=something) and (?=something)) don't work on variable-lenght patterns, i.e., on .*. Try this:

(?<=")(.*?)(Foo)(.*?)(?=")

and then use match strings (depending on your language: $1,$2,... or \1,\2,... or members of some array or something like that).

2 Comments

This matches the entire line, I would like to only extract Foo
@Kami You are extracting it as the second match string. Admittedly, look-arounds are of no use here.
0

Try to do something with this kind of pattern:

"[^"]*?Foo[^"]*?"

2 Comments

This matches the entire line, I would like to only extract Foo
@Kami: you must add capturing parenthesis around what you want to preserve
0

In Notepad++

search : ("[^"]*)Foo([^"]*")
replace : $1Bar$2

4 Comments

What if there are two Foos inside a quoted string?
@TimPietzcker: Only one is replaced. But it's may be enough as OP doesn't say anything about it.
@TimPietzcker: you push the replaceAll button a second time
@CasimiretHippolyte: Yuk. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.