Python Regex to find a string in double quotes within a string

Question

I'm looking for a code in python using regex that can perform something like this

Input: Regex should return "String 1" or "String 2" or "String3"

Output: String 1,String2,String3

I tried r'"*"'

There could be quotes inside quotes, what would you do with that? — user1227804
– user1227804, Commented Mar 1, 2012 at 16:15
No, there wont be any quotes. Just simple string with a-z , 0-9 whitespaces, underscore, mostly alphanumeric without any single or double quotes inside them3 — nomi
– nomi, Commented Mar 1, 2012 at 16:17
Does this answer your question? Extract string from between quotations — wjandrea
– wjandrea, Commented Dec 10, 2021 at 2:19

wjandrea · Accepted Answer · 2021-12-10 02:05:23Z

73

Here's all you need to do:

def doit(text):      
  import re
  matches = re.findall(r'"(.+?)"',text)
  # matches is now ['String 1', 'String 2', 'String3']
  return ",".join(matches)

doit('Regex should return "String 1" or "String 2" or "String3" ')

result:

'String 1,String 2,String3'

As pointed out by Li-aung Yip:

To elaborate, .+? is the "non-greedy" version of .+. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version, .+, will give String 1" or "String 2" or "String 3; the non-greedy version .+? gives String 1, String 2, String 3.

In addition, if you want to accept empty strings, change .+ to .*. Star * means zero or more while plus + means at least one.

edited Dec 10, 2021 at 2:05

wjandrea

34k10 gold badges69 silver badges105 bronze badges

answered Mar 1, 2012 at 16:23

Johan Lundberg

27.3k13 gold badges76 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Li-aung Yip Over a year ago

To elaborate, .+? is the "non-greedy" version of .+. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version, .+, will give string 1" or "String 2" or "String 3; the non-greedy version .+? gives String 1, String 2, String 3.

Sam De Meyer Over a year ago

Is it necessary to to escape the double quotes? Since you are using a raw string anyway. Wouldn't r'"(.+?)"' suffice? That seems to work on my system.

wjandrea Over a year ago

@Sam It's not necessary. I went ahead and removed them :)

Booboo · Accepted Answer · 2025-05-26 10:20:50Z

7

The highly up-voted answer doesn't account for the possibility that the double-quoted string might contain one or more double-quote characters escaped with a backslash. To handle this situation, the regex needs to accumulate between the opening and closing double-quotes zero or more matches where each match is either an escaped character sequence (a backslash followed by any character) or any character that is not a double-quote. We further assume that the quoted string exists wholly on a single line and so we do not allow newline characters within our string.

r'"(?:\\.|[^"\n])*"'

" matches a double-quote.
(?: - start of a non-capture group.
\\. - matches a backslash followed by any non-newline character representing an escaped character sequence.
| - "or".
[^"\n] - matches any character other than a double-quote or newline.
) - end of non-capture group.
* - matches 0 or more occurrences of the previous group.
" - matches a double-quote.

See Regex Demo

import re

def doit(text):
    print('input:', text)
    for i, match in enumerate(re.findall(r'"(?:\\.|[^"\n])*"', text), start=1):
        print(f'match {i}: {match}')
    print()

doit(r'Regex should return "String 1" or "String 2" or "String3" and "\"double quoted string\"" ')
doit(r'"abcdef\"ghij"')
doit(r'"abcdef\\"ghij"')
doit(r'"abcdef\\\"ghij"')

Prints:

input: Regex should return "String 1" or "String 2" or "String3" and "\"double quoted string\""
match 1: "String 1"
match 2: "String 2"
match 3: "String3"
match 4: "\"double quoted string\""

input: "abcdef\"ghij"
match 1: "abcdef\"ghij"

input: "abcdef\\"ghij"
match 1: "abcdef\\"

input: "abcdef\\\"ghij"
match 1: "abcdef\\\"ghij

edited May 26 at 10:20

answered Sep 2, 2020 at 13:52

Booboo

45.7k4 gold badges46 silver badges74 bronze badges

3 Comments

FMc May 25 at 18:22

This approach seems to be unable support a quoted string that ends with a backslash. No amount of backslashing-the-backslashes helped. My experiments here.

Booboo May 26 at 2:16

@FMc I have updated the regex.

Booboo May 26 at 2:17

@Gregory I have updated the regex.

wjandrea · Accepted Answer · 2021-12-10 02:14:44Z

4

Just try to fetch double quoted strings from the multiline string:

import re

s = """ 
"my name is daniel"  "mobile 8531111453733"[[[[[[--"i like pandas"
"location chennai"! -asfas"aadhaar du2mmy8969769##69869" 
@4343453 "pincode 642002""@mango,@apple,@berry" 
"""
print(re.findall(r'"(.*?)"', s))

edited Dec 10, 2021 at 2:14

wjandrea

34k10 gold badges69 silver badges105 bronze badges

answered Aug 3, 2019 at 9:19

Daniel Muthupandi

691 silver badge2 bronze badges

1 Comment

wjandrea Over a year ago

This is the exact same solution as the accepted answer

Giovanni · Accepted Answer · 2021-11-09 00:18:23Z

From https://stackoverflow.com/a/69891301/1531728

My solution is:

import re
my_strings = ['SetVariables "a" "b" "c" ', 'd2efw   f "first" +&%#$%"second",vwrfhir, d2e   u"third" dwedew', '"uno"?>P>MNUIHUH~!@#$%^&*()_+=0trewq"due"        "tre"fef    fre f', '       "uno""dos"      "tres"', '"unu""doua""trei"', '      "um"                    "dois"           "tres"                  ']
my_substrings = []
for current_test_string in my_strings:
    for values in re.findall(r'\"(.+?)\"', current_test_string):
        my_substrings.append(values)
        #print("values are:",values,"=")
    print(" my_substrings are:",my_substrings,"=")
    my_substrings = []

Alternate regular expressions to use are:

re.findall('"(.+?)"', current_test_string) [Avinash2021] [user17405772021]
re.findall('"(.*?)"', current_test_string) [Shelvington2020]
re.findall(r'"(.*?)"', current_test_string) [Lundberg2012] [Avinash2021]
re.findall(r'"(.+?)"', current_test_string) [Lundberg2012] [Avinash2021]
re.findall(r'"["]', current_test_string) [Muthupandi2019]
re.findall(r'"([^"]*)"', current_test_string) [Pieters2014]
re.findall(r'"(?:(?:(?!(?<!\)").)*)"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Booboo2020]
re.findall(r'"(.*?)(?<!\)"', current_test_string) [Hassan2014]
re.findall('"[^"]*"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Martelli2013]
re.findall('"([^"]*)"', current_test_string) [jspcal2014]
re.findall("'(.*?)'", current_test_string) [akhilmd2016]

The current_test_string.split("\"") approach works if the strings have patterns in which substrings are embedded within quotation marks. This is because it uses the double quotation mark in this example as a delimiter to tokenize the string, and accepts substrings that are not embedded within double quotation marks as valid substring extractions from the string.

References:

[Avinash2021] Arvind Kumar Avinash, Answer to ``Extract text between quotation using regex python'', Stack Exchange, Inc., New York, NY, October 12, 2021. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/69543129/1531728 and Extract text between quotation using regex python November 8, 2021 was the last accessed date.
[user17405772021] user1740577, Answer to ``Extract text between quotation using regex python'', Stack Exchange, Inc., New York, NY, October 12, 2021. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/69543030/1531728 and Extract text between quotation using regex python November 8, 2021 was the last accessed date.
[Shelvington2020] Iain Shelvington, Answer to ``Extracting only words out of a mixed string in Python [duplicate]'', Stack Exchange, Inc., New York, NY, January 5, 2020. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/59598630/1531728 and Extracting only words out of a mixed string in Python November 6, 2021 was the last accessed date.
[Lundberg2012] Johan Lundberg, Answer to ``Python Regex to find a string in double quotes within a string'', Stack Exchange, Inc., New York, NY, March 1, 2012. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/9519934/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
[Muthupandi2019] Daniel Muthupandi and trotta, Answer to ``Python Regex to find a string in double quotes within a string'', Stack Exchange, Inc., New York, NY, August 3, 2019. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/57337020/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
[Booboo2020] Booboo, Answer to ``Python Regex to find a string in double quotes within a string'', Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/63707053/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
[Pieters2014] Martijn Pieters, Answer to ``Extract a string between double quotes'', Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/22735466/1531728 and Extract a string between double quotes November 6, 2021 was the last accessed date.
[Hassan2014] Sabuj Hassan, Answer to ``Extract a string between double quotes'', Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/22735480/1531728 and Extract a string between double quotes November 6, 2021 was the last accessed date.
[Martelli2013] Alex Martelli and Sumit Singh, Answer to "Extract string from between quotations", Stack Exchange Inc., New York, NY, March 14, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/2076357/1531728 and Extract string from between quotations November 6, 2021 was the last accessed date.
[jspcal2014] jspcal, Answer to "Extract string from between quotations", Stack Exchange Inc., New York, NY, March 14, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/2076356/1531728 and Extract string from between quotations November 6, 2021 was the last accessed date.
[akhilmd2016] akhilmd, Answer to "Stripping string in python between quotes", Stack Exchange Inc., New York, NY, July 2, 2016. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/38161072/1531728 and ; November 5, 2021 was the last accessed date.

Frank · Accepted Answer · 2022-07-13 08:47:53Z

1

For me the only regex that ever worked right for all the cases of quoted strings with possibly escaped quotes inside of them was:

regex=r"""(['"])(?:\\\\|\\\1|[^\1])*?\1"""

This will not fail even if the quoted string ends with an escaped backslash.

answered Jul 13, 2022 at 8:47

Frank

1,0345 silver badges11 bronze badges

1 Comment

FMc May 25 at 18:25

This answer is under-appreciated -- especially the issue noted at the end. As best I can tell, it is easier to understand and works better than the alternatives I've seen so far.

PatriceC · Accepted Answer · 2020-02-06 09:47:34Z

-2

import re
r=r"'(\\'|[^'])*(?!<\\)'|\"(\\\"|[^\"])*(?!<\\)\""

texts=[r'"aerrrt"',
r'"a\"e'+"'"+'rrt"',
r'"a""""arrtt"""""',
r'"aerrrt',
r'"a\"errt'+"'",
r"'aerrrt'",
r"'a\'e"+'"'+"rrt'",
r"'a''''arrtt'''''",
r"'aerrrt",
r"'a\'errt"+'"',
      "''",'""',""]

for text in texts:
     print (text,"-->",re.fullmatch(r,text))

results:

"aerrrt" --> <_sre.SRE_Match object; span=(0, 8), match='"aerrrt"'>
"a\"e'rrt" --> <_sre.SRE_Match object; span=(0, 10), match='"a\\"e\'rrt"'>
"a""""arrtt""""" --> None
"aerrrt --> None
"a\"errt' --> None
'aerrrt' --> <_sre.SRE_Match object; span=(0, 8), match="'aerrrt'">
'a\'e"rrt' --> <_sre.SRE_Match object; span=(0, 10), match='\'a\\\'e"rrt\''>
'a''''arrtt''''' --> None
'aerrrt --> None
'a\'errt" --> None
'' --> <_sre.SRE_Match object; span=(0, 2), match="''">
"" --> <_sre.SRE_Match object; span=(0, 2), match='""'>
 --> None

answered Feb 6, 2020 at 9:47

PatriceC

113 bronze badges

1 Comment

Abhishek Gurjar Over a year ago

Can you explain how your answer is working in some details. I am unable to find it according to question asked.

Collectives™ on Stack Overflow

Python Regex to find a string in double quotes within a string

6 Answers 6

3 Comments

3 Comments

1 Comment

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

3 Comments

1 Comment

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related