Some hapless coworker saved some data into a file like this:
s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
f.write(str(s))
when they should have used
s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
f.write(s.decode())
Now foo.txt looks like
b'The em-dash: \xe2\x80\x94'
Instead of
The em dash: —
I already read this file as a string:
with open('foo.txt') as f:
bad_foo = f.read()
Now how can I convert bad_foo from the incorrectly-saved format to the correctly-saved string?
.decodedoesn't make sense without an encoding name. Why are you using byte strings in the first place, anyway? The idiomatic way to do this is to use a Unicode string and let Python encode it when writing to a file.evalcan be suggested for undoing this.