I'm looking for a code in python using regex that can perform something like this
Input: Regex should return "String 1" or "String 2" or "String3"
Output: String 1,String2,String3
I tried r'"*"'
I'm looking for a code in python using regex that can perform something like this
Input: Regex should return "String 1" or "String 2" or "String3"
Output: String 1,String2,String3
I tried r'"*"'
Here's all you need to do:
def doit(text):
import re
matches = re.findall(r'"(.+?)"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)
doit('Regex should return "String 1" or "String 2" or "String3" ')
result:
'String 1,String 2,String3'
As pointed out by Li-aung Yip:
To elaborate,
.+?is the "non-greedy" version of.+. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version,.+, will giveString 1" or "String 2" or "String 3; the non-greedy version.+?givesString 1,String 2,String 3.
In addition, if you want to accept empty strings, change .+ to .*. Star * means zero or more while plus + means at least one.
.+? is the "non-greedy" version of .+. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version, .+, will give string 1" or "String 2" or "String 3; the non-greedy version .+? gives String 1, String 2, String 3.r'"(.+?)"' suffice? That seems to work on my system.The highly up-voted answer doesn't account for the possibility that the double-quoted string might contain one or more double-quote characters escaped with a backslash. To handle this situation, the regex needs to accumulate between the opening and closing double-quotes zero or more matches where each match is either an escaped character sequence (a backslash followed by any character) or any character that is not a double-quote. We further assume that the quoted string exists wholly on a single line and so we do not allow newline characters within our string.
r'"(?:\\.|[^"\n])*"'
" matches a double-quote.(?: - start of a non-capture group.\\. - matches a backslash followed by any non-newline character representing an escaped character sequence.| - "or".[^"\n] - matches any character other than a double-quote or newline.) - end of non-capture group.* - matches 0 or more occurrences of the previous group." - matches a double-quote.import re
def doit(text):
print('input:', text)
for i, match in enumerate(re.findall(r'"(?:\\.|[^"\n])*"', text), start=1):
print(f'match {i}: {match}')
print()
doit(r'Regex should return "String 1" or "String 2" or "String3" and "\"double quoted string\"" ')
doit(r'"abcdef\"ghij"')
doit(r'"abcdef\\"ghij"')
doit(r'"abcdef\\\"ghij"')
Prints:
input: Regex should return "String 1" or "String 2" or "String3" and "\"double quoted string\""
match 1: "String 1"
match 2: "String 2"
match 3: "String3"
match 4: "\"double quoted string\""
input: "abcdef\"ghij"
match 1: "abcdef\"ghij"
input: "abcdef\\"ghij"
match 1: "abcdef\\"
input: "abcdef\\\"ghij"
match 1: "abcdef\\\"ghij
Just try to fetch double quoted strings from the multiline string:
import re
s = """
"my name is daniel" "mobile 8531111453733"[[[[[[--"i like pandas"
"location chennai"! -asfas"aadhaar du2mmy8969769##69869"
@4343453 "pincode 642002""@mango,@apple,@berry"
"""
print(re.findall(r'"(.*?)"', s))
From https://stackoverflow.com/a/69891301/1531728
My solution is:
import re
my_strings = ['SetVariables "a" "b" "c" ', 'd2efw f "first" +&%#$%"second",vwrfhir, d2e u"third" dwedew', '"uno"?>P>MNUIHUH~!@#$%^&*()_+=0trewq"due" "tre"fef fre f', ' "uno""dos" "tres"', '"unu""doua""trei"', ' "um" "dois" "tres" ']
my_substrings = []
for current_test_string in my_strings:
for values in re.findall(r'\"(.+?)\"', current_test_string):
my_substrings.append(values)
#print("values are:",values,"=")
print(" my_substrings are:",my_substrings,"=")
my_substrings = []
Alternate regular expressions to use are:
The current_test_string.split("\"") approach works if the strings have patterns in which substrings are embedded within quotation marks. This is because it uses the double quotation mark in this example as a delimiter to tokenize the string, and accepts substrings that are not embedded within double quotation marks as valid substring extractions from the string.
References:
For me the only regex that ever worked right for all the cases of quoted strings with possibly escaped quotes inside of them was:
regex=r"""(['"])(?:\\\\|\\\1|[^\1])*?\1"""
This will not fail even if the quoted string ends with an escaped backslash.
import re
r=r"'(\\'|[^'])*(?!<\\)'|\"(\\\"|[^\"])*(?!<\\)\""
texts=[r'"aerrrt"',
r'"a\"e'+"'"+'rrt"',
r'"a""""arrtt"""""',
r'"aerrrt',
r'"a\"errt'+"'",
r"'aerrrt'",
r"'a\'e"+'"'+"rrt'",
r"'a''''arrtt'''''",
r"'aerrrt",
r"'a\'errt"+'"',
"''",'""',""]
for text in texts:
print (text,"-->",re.fullmatch(r,text))
results:
"aerrrt" --> <_sre.SRE_Match object; span=(0, 8), match='"aerrrt"'>
"a\"e'rrt" --> <_sre.SRE_Match object; span=(0, 10), match='"a\\"e\'rrt"'>
"a""""arrtt""""" --> None
"aerrrt --> None
"a\"errt' --> None
'aerrrt' --> <_sre.SRE_Match object; span=(0, 8), match="'aerrrt'">
'a\'e"rrt' --> <_sre.SRE_Match object; span=(0, 10), match='\'a\\\'e"rrt\''>
'a''''arrtt''''' --> None
'aerrrt --> None
'a\'errt" --> None
'' --> <_sre.SRE_Match object; span=(0, 2), match="''">
"" --> <_sre.SRE_Match object; span=(0, 2), match='""'>
--> None