0

Some hapless coworker saved some data into a file like this:

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(str(s))

when they should have used

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(s.decode())

Now foo.txt looks like

b'The em-dash: \xe2\x80\x94'

Instead of

The em dash: —

I already read this file as a string:

with open('foo.txt') as f:
    bad_foo = f.read()

Now how can I convert bad_foo from the incorrectly-saved format to the correctly-saved string?

9
  • .decode doesn't make sense without an encoding name. Why are you using byte strings in the first place, anyway? The idiomatic way to do this is to use a Unicode string and let Python encode it when writing to a file. Commented Dec 11, 2018 at 18:32
  • @tripleee someone else did it, and I've been tasked with undoing it :) Commented Dec 11, 2018 at 18:32
  • 1
    I suspect nothing much more useful than eval can be suggested for undoing this. Commented Dec 11, 2018 at 18:32
  • @tripleee this was intended as a self-answer. See stackoverflow.com/a/53730411/2954547 Commented Dec 11, 2018 at 18:39
  • 1
    @shadowtalker there's a "answer your own question" checkbox just below the "Post Your Question" button on the Ask Question page that let's you get your answer in before the competition ;-) Commented Dec 11, 2018 at 18:44

3 Answers 3

3

You can try literal eval

from ast import literal_eval
test = r"b'The em-dash: \xe2\x80\x94'"
print(test)
res = literal_eval(test)
print(res.decode())
Sign up to request clarification or add additional context in comments.

Comments

1

If you trust that the input is not malicious, you can use ast.literal_eval on the broken string.

import ast

# Create a sad broken string
s = "b'The em-dash: \xe2\x80\x94'"

# Parse and evaluate the string as raw Python source, creating a `bytes` object
s_bytes = ast.literal_eval(s)

# Now decode the `bytes` as normal
s_fixed = s_bytes.decode()

Otherwise you will have to manually parse and remove or replace the offending repr'ed escapes.

Comments

-2

This code is working correct in my computer. But if you still get error, this may help you

with open('foo.txt', 'r', encoding="utf-8") as f:
    print(f.read())

1 Comment

No, the problemeis that the file contains the repr() of the byte string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.