Replace backslash followed by double quotation in a text file in Python

Question

I have a text file, and its content is like this:

"good to know it \" so nice \" "

I use Python to read its contents and want to replace " with an empty string.

The code I am using is:

import re

file_path = "backslash_double_quotation.txt"
with open(file_path, "r") as input_file:
    raw_text = input_file.read()
processed_text = re.sub(r'\"', "", raw_text)
print(raw_text)
print(processed_text)

and I expect processed_text like this:

"good to know it  so nice  "

However, the actual output is:

good to know it \ so nice \

All the double quotations are replaced by empty strings. How can I fix this?

re.sub treats r'\"' as a regular expression, and the regular expression \" only matches a literal " (as " has no special meaning in a regular expression). r'\"' would be correct if you using string equality, and not regular-expression matching. — chepner
– chepner, Commented Feb 23, 2023 at 18:14

Blue Robin · Accepted Answer · 2023-02-23 18:25:00Z

1

With strings you can use .replace() to replace specific characters or words in a string.

For example:

text = "good to know it \" so nice \""
print(text.replace("\"", " "))

The output for this is:

good to know it   so nice

With your code:

import re
file_path = "backslash_double_quotation.txt"
with open(file_path, "r") as input_file:
    raw_text = input_file.read()
processed_text = raw_text.replace("\"", "")
print(raw_text)
print(processed_text)

If you want to use re then:

processed_text = re.sub(r"\\", "", raw_text)

edited Feb 23, 2023 at 18:25

answered Feb 23, 2023 at 18:12

Blue Robin

1,1536 gold badges17 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user2357112 Over a year ago

The input file contains actual backslashes, and this is still going to leave those backslashes in. We can see the backslashes aren't just a string repr thing - the code in the question isn't doing anything that would call repr, and the post-re.sub output contains backslashes with no quotation marks.

Peter Mortensen · Accepted Answer · 2023-03-20 12:51:40Z

1

You don't get the expected result because of "raw-string", "r" in your example. If you add "r" you should specify your regex expression without any escape characters.

Just remove "r" in your example and it will work as expected:

processed_text = re.sub('\"', "", raw_text)

Reference:

Raw String Notation

edited Mar 20, 2023 at 12:51

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Feb 24, 2023 at 4:35

chacobsa

111 bronze badge

1 Comment

sln Over a year ago

See @chepner comment above. Regex does not see \" as a special escape sequence so it treats it as just a quote literal " to match regex101.com/r/csMyXv/1 . The op needs to match the escape character \\ plus quote " equals \\" that needs to get to rx engine.

Tanmay Shrivastava · Accepted Answer · 2023-02-23 18:32:18Z

0

Eliminate one by one

processed_text = raw_text.replace('"', '')
processed_text = processed_text.replace('\', '')

answered Feb 23, 2023 at 18:32

Tanmay Shrivastava

5794 silver badges9 bronze badges

1 Comment

user2357112 Over a year ago

First, processed_text.replace('\', '') is invalid syntax, and second, the questioner only wants to remove backslash-and-quotation-mark sequences, not solo backslashes or solo quotation marks.

Arthur · Accepted Answer · 2023-02-25 00:30:45Z

0

I found this works:

processed_text = re.sub(r'\\"', "", raw_text)

answered Feb 25, 2023 at 0:30

Arthur

1912 silver badges13 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2023-03-20 12:54:04Z

0

It's hard to imagine that an escaped double quote \" means something else than include this quote in the delimited double quote string. Therefore it's impossible to imagine not using an escaped escape \\ to differentiate an included escape in the string from not treating a following double quote (if any) as the closing string delimiter.

This seems to be a nonambiguous way to tell the difference -

https://regex101.com/r/FH2Dfp/1

Find (raw context, wrap in r' '):

(?<!\\)((?:\\\\)*)\\"

Replace with:

\1

edited Mar 20, 2023 at 12:54

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Feb 23, 2023 at 20:03

sln

3,6431 gold badge7 silver badges13 bronze badges

Collectives™ on Stack Overflow

Replace backslash followed by double quotation in a text file in Python

5 Answers 5

1 Comment

1 Comment

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related