Bytes object stored in "repr format" as b'foo' instead of encode()-ing to string -- how to fix?

Question

Some hapless coworker saved some data into a file like this:

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(str(s))

when they should have used

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(s.decode())

Now foo.txt looks like

b'The em-dash: \xe2\x80\x94'

Instead of

The em dash: —

I already read this file as a string:

with open('foo.txt') as f:
    bad_foo = f.read()

Now how can I convert bad_foo from the incorrectly-saved format to the correctly-saved string?

.decode doesn't make sense without an encoding name. Why are you using byte strings in the first place, anyway? The idiomatic way to do this is to use a Unicode string and let Python encode it when writing to a file. — tripleee
– tripleee, Commented Dec 11, 2018 at 18:32
@tripleee someone else did it, and I've been tasked with undoing it :) — shadowtalker
– shadowtalker, Commented Dec 11, 2018 at 18:32
I suspect nothing much more useful than eval can be suggested for undoing this. — tripleee
– tripleee, Commented Dec 11, 2018 at 18:32
@tripleee this was intended as a self-answer. See stackoverflow.com/a/53730411/2954547 — shadowtalker
– shadowtalker, Commented Dec 11, 2018 at 18:39
@shadowtalker there's a "answer your own question" checkbox just below the "Post Your Question" button on the Ask Question page that let's you get your answer in before the competition ;-) — snakecharmerb
– snakecharmerb, Commented Dec 11, 2018 at 18:44

Paritosh Singh · Accepted Answer · 2018-12-11 18:34:28Z

3

You can try literal eval

from ast import literal_eval
test = r"b'The em-dash: \xe2\x80\x94'"
print(test)
res = literal_eval(test)
print(res.decode())

answered Dec 11, 2018 at 18:34

Paritosh Singh

6,2562 gold badges17 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

shadowtalker · Accepted Answer · 2018-12-11 18:38:22Z

1

If you trust that the input is not malicious, you can use ast.literal_eval on the broken string.

import ast

# Create a sad broken string
s = "b'The em-dash: \xe2\x80\x94'"

# Parse and evaluate the string as raw Python source, creating a `bytes` object
s_bytes = ast.literal_eval(s)

# Now decode the `bytes` as normal
s_fixed = s_bytes.decode()

Otherwise you will have to manually parse and remove or replace the offending repr'ed escapes.

answered Dec 11, 2018 at 18:38

shadowtalker

14.1k5 gold badges65 silver badges120 bronze badges

Comments

ozcanyarimdunya · Accepted Answer · 2018-12-11 18:32:50Z

-2

This code is working correct in my computer. But if you still get error, this may help you

with open('foo.txt', 'r', encoding="utf-8") as f:
    print(f.read())

answered Dec 11, 2018 at 18:32

ozcanyarimdunya

2,5621 gold badge20 silver badges23 bronze badges

1 Comment

tripleee Over a year ago

No, the problemeis that the file contains the repr() of the byte string.

Collectives™ on Stack Overflow

Bytes object stored in "repr format" as b'foo' instead of encode()-ing to string -- how to fix?

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related