0

I have a text file, and its content is like this:

"good to know it \" so nice \" "

I use Python to read its contents and want to replace " with an empty string.

The code I am using is:

import re

file_path = "backslash_double_quotation.txt"
with open(file_path, "r") as input_file:
    raw_text = input_file.read()
processed_text = re.sub(r'\"', "", raw_text)
print(raw_text)
print(processed_text)

and I expect processed_text like this:

"good to know it  so nice  "

However, the actual output is:

good to know it \ so nice \

All the double quotations are replaced by empty strings. How can I fix this?

1
  • 2
    re.sub treats r'\"' as a regular expression, and the regular expression \" only matches a literal " (as " has no special meaning in a regular expression). r'\"' would be correct if you using string equality, and not regular-expression matching. Commented Feb 23, 2023 at 18:14

5 Answers 5

1

With strings you can use .replace() to replace specific characters or words in a string.

For example:

text = "good to know it \" so nice \""
print(text.replace("\"", " "))

The output for this is:

good to know it   so nice  

With your code:

import re
file_path = "backslash_double_quotation.txt"
with open(file_path, "r") as input_file:
    raw_text = input_file.read()
processed_text = raw_text.replace("\"", "")
print(raw_text)
print(processed_text)

If you want to use re then:

processed_text = re.sub(r"\\", "", raw_text)
Sign up to request clarification or add additional context in comments.

1 Comment

The input file contains actual backslashes, and this is still going to leave those backslashes in. We can see the backslashes aren't just a string repr thing - the code in the question isn't doing anything that would call repr, and the post-re.sub output contains backslashes with no quotation marks.
1

You don't get the expected result because of "raw-string", "r" in your example. If you add "r" you should specify your regex expression without any escape characters.

Just remove "r" in your example and it will work as expected:

processed_text = re.sub('\"', "", raw_text)

Reference:

Raw String Notation

1 Comment

See @chepner comment above. Regex does not see \" as a special escape sequence so it treats it as just a quote literal " to match regex101.com/r/csMyXv/1 . The op needs to match the escape character \\ plus quote " equals \\" that needs to get to rx engine.
0

Eliminate one by one

processed_text = raw_text.replace('"', '')
processed_text = processed_text.replace('\', '')

1 Comment

First, processed_text.replace('\', '') is invalid syntax, and second, the questioner only wants to remove backslash-and-quotation-mark sequences, not solo backslashes or solo quotation marks.
0

I found this works:

processed_text = re.sub(r'\\"', "", raw_text)

Comments

0

It's hard to imagine that an escaped double quote \" means something else than include this quote in the delimited double quote string. Therefore it's impossible to imagine not using an escaped escape \\ to differentiate an included escape in the string from not treating a following double quote (if any) as the closing string delimiter.

This seems to be a nonambiguous way to tell the difference -

https://regex101.com/r/FH2Dfp/1

Find (raw context, wrap in r' '):

(?<!\\)((?:\\\\)*)\\"

Replace with:

\1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.