3

From

string= this is, not good "type of ,question" to ask, on stackoverflow

I want to extract "type of , question" substring and replace ',' with ' '.

with re.findall() it yields a list of characters between " " and with re.search it yields class object.

With re.sub() it replaces all ',' but I need them except the ones that are inside sub-string with double quotes.

Can anyone help me with this problem.

Thanks in advance!!

9
  • It sounds like you already tried using re.findall, re.search, and re.sub, yes? Please share the code for each of those attempts. Commented Nov 20, 2018 at 16:27
  • output: this is, not good "type of question" to ask, on stackoverflow Commented Nov 20, 2018 at 16:27
  • Is it 'string= this is, not good "type of ,question" to ask, on stackoverflow' or string = 'this is, not good "type of ,question" to ask, on stackoverflow'? Commented Nov 20, 2018 at 16:29
  • What should happen if there are more than two quote marks in the string? What if there are an odd number of quote marks? What if there are two 'real' quote marks, and one escaped quote mark inside that quote? Commented Nov 20, 2018 at 16:29
  • 1
    If the only data the code needs to work on is the one example you gave, then you only have to do result = 'this is, not good "type of question" to ask, on stackoverflow'. If you're thinking "very funny, I actually need it to work on a variety of inputs", then that's exactly why I'm asking these clarifying questions :-) Commented Nov 20, 2018 at 16:33

5 Answers 5

4

Use regex capture groups:

import re
s= 'this is, not good "type of ,question" to ask, on stackoverflow'
re.sub(r'(".*?),(.*?")', r'\1\2', s)

output:

'this is, not good "type of question" to ask, on stackoverflow'

Explanation: (stuff) in regex stands for capture groups, \1 and \2 respectively substitutes the part before and after the , character within the quoted part of string. Please note this also works for multiple quotes within a single string as well.

Sign up to request clarification or add additional context in comments.

6 Comments

this one worked perfectly for me. Thanks for helping me out @Rocky Li
Glad to be able to help
Very nice. In order to add the space (from the requirement replace ',' with ' '), add a space between \1 and \2
This would only remove one instance of comma though, which is fair if that's all OP need.
Also unbalanced quotes will introduced problems. See 'this is, not good", "type of ,question" to ask, on stackoverflow' yields this is, not good" "type of ,question" to ask, on stackoverflow
|
2

Another way that gives you some flexibility is you can do it by two steps:

  1. Find all the matches that are contained in quotations,

  2. In each match look for and replace the ','.

Example:

# define a pattern that gets you everything inside a double quote
pat = re.compile(r'"[^"]+"')

# re.sub the quote pattern and replace the , in each of those matches.
string = pat.sub(lambda x: x.group(0).replace(',',''), string)

# 'this is, not good "type of question" to ask, on stackoverflow'

The flexibility of this is it allows you to replace as many ',' as you need, and you can perform other changes as well once you have located all the double quote patterns.

1 Comment

This should be higher up. My solution does not address multiple , in a single quote, which is a big overlook on my part.
1

How about a combination of split() and replace()? :

s = 'this is, not good "type of ,question" to ask, on stackoverflow'

splitted = s.split('"')
print(s.replace(splitted[1], splitted[1].replace(',', '')))

# this is, not good "type of question" to ask, on stackoverflow

Note: This works in this case, but does not work in cases where you have the exact same string within double quotes outside the double quotes.

Comments

1

How about this:

b=""" "hello, howdy". sample text, text then comes "Another, double, quotes" """

for str_match in re.findall(r"\".*?\"",b):
    b = re.sub(str_match,re.sub(r","," ",str_match),b)

print(b)

output: "hello howdy". sample text, text then comes "Another double quotes" '

Comments

0

I'm not completely sure if this will match all your requirements, but on the template you offer the following would return what you are looking for.

result = re.sub('("(?:[^"])*),((?:[^"])*")', r"\1 \2")

1 Comment

Rocky Li's answer is cleaner, tough.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.