2

I have tried extracting text inside quotations ""

file content:
"abc"
"ABC. XYZ"
"1 - 2 - 3"

code i've tried using regex

title = re.findall(r'\"(.+?)\"', filecontent)
print(title)

Output:

['abc']
[] # Some lines comes out like this empty
['1 - 2 - 3']

Some of the lines comes empty not sure why. is there an alternative better way to do this?

2

3 Answers 3

3

If you want to extract some substring out of a string, you can go for re.search.

Demo:

import re

str_list = ['"abc"', '"ABC. XYZ"', '"1 - 2 - 3"']

for str in str_list:
    search_str = re.search('"(.+?)"', str)
    if search_str:
        print(search_str.group(1))

Output:

abc
ABC. XYZ
1 - 2 - 3
Sign up to request clarification or add additional context in comments.

Comments

2

IIUC, Do you try this?

filecontent = '''
"abc"
"ABC. XYZ"
"1 - 2 - 3"
'''

re.findall(r'\"(.+?)\"', filecontent)

Output:

['abc', 'ABC. XYZ', '1 - 2 - 3']

Comments

0

My solution is:

import re
my_strings = ['SetVariables "a" "b" "c" ', 'd2efw   f "first" +&%#$%"second",vwrfhir, d2e   u"third" dwedew', '"uno"?>P>MNUIHUH~!@#$%^&*()_+=0trewq"due"        "tre"fef    fre f', '       "uno""dos"      "tres"', '"unu""doua""trei"', '      "um"                    "dois"           "tres"                  ']
my_substrings = []
for current_test_string in my_strings:
    for values in re.findall(r'\"(.+?)\"', current_test_string):
        my_substrings.append(values)
        #print("values are:",values,"=")
    print(" my_substrings are:",my_substrings,"=")
    my_substrings = []

Alternate regular expressions to use are:

  • re.findall('"(.+?)"', current_test_string) [Avinash2021] [user17405772021]
  • re.findall('"(.*?)"', current_test_string) [Shelvington2020]
  • re.findall(r'"(.*?)"', current_test_string) [Lundberg2012] [Avinash2021]
  • re.findall(r'"(.+?)"', current_test_string) [Lundberg2012] [Avinash2021]
  • re.findall(r'"["]', current_test_string) [Muthupandi2019]
  • re.findall(r'"([^"]*)"', current_test_string) [Pieters2014]
  • re.findall(r'"(?:(?:(?!(?<!\)").)*)"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Booboo2020]
  • re.findall(r'"(.*?)(?<!\)"', current_test_string) [Hassan2014]
  • re.findall('"[^"]*"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Martelli2013]
  • re.findall('"([^"]*)"', current_test_string) [jspcal2014]
  • re.findall("'(.*?)'", current_test_string) [akhilmd2016]

The current_test_string.split("\"") approach works if the strings have patterns in which substrings are embedded within quotation marks. This is because it uses the double quotation mark in this example as a delimiter to tokenize the string, and accepts substrings that are not embedded within double quotation marks as valid substring extractions from the string.

References:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.