How to decode a string representation of a bytes object?

Question

I have a string which includes encoded bytes inside it:

str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"

I want to decode it, but I can't since it has become a string. Therefore I want to ask whether there is any way I can convert it into

str2 = b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'

Here str2 is a bytes object which I can decode easily using

str2.decode('utf-8')

to get the final result:

'Output file 문항분석.xlsx Created'

Zero Piraeus · Accepted Answer · 2019-02-26 15:59:37Z

1

You could use ast.literal_eval:

>>> print(str1)
b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
>>> type(str1)
<class 'str'>

>>> from ast import literal_eval
>>> literal_eval(str1).decode('utf-8')
'Output file 문항분석.xlsx Created'

edited Feb 26, 2019 at 15:59

answered Feb 26, 2019 at 15:53

Zero Piraeus

59.7k28 gold badges158 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Sujil Devkota Over a year ago

SyntaxError: bytes can only contain ASCII literal characters.

Zero Piraeus Over a year ago

That suggests your input is actually something like "b'Output file ë¬¸í\x95\xadë¶\x84ì\x84\x9d.xlsx Created'", rather than the escaped string I inferred from your question. I think at that point it's time to go and fix whatever's sending you such messed up input, to be honest …

Sujil Devkota Over a year ago

dropbox.com/s/fmkrhy0pt29rdi3/…

Zero Piraeus Over a year ago

Please see my previous comment – once things are as messed up as that, you're better off fixing whatever is generating such an unpleasant string.

Sujil Devkota Over a year ago

I myself has converted it to bytes msg = bytes("Output file " + output_filename + " Created", 'utf-8') print(msg) To send the msg through popen().communicate() function which doesn't support the original msg after i receive after communicate i get the list of bytes string like above

|

Sujil Devkota · Accepted Answer · 2019-02-28 16:59:28Z

0

A simple way is to assume that all the characters of the initial strings are in the [0,256) range and map to the same Unicode value, which means that it is a Latin1 encoded string.

The conversion is then trivial:

str1[2:-1].encode('Latin1').decode('utf8')

edited Feb 28, 2019 at 16:59

Sujil Devkota

1132 silver badges10 bronze badges

answered Feb 28, 2019 at 13:47

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

2 Comments

Sujil Devkota Over a year ago

Thank you, it is very short and way more easier solution then i found

Sujil Devkota Over a year ago

But one thing is that it works fine when i run this code separately but in my main program when i implement it, the string after Latin1 encoding will automatically add \ in-front of any \ so there is \\ in the string therefore decoding it makes it just remove the single slash.So, the above code becomes similar to str1[2:-1]. I think in order to escape \ character python is adding another \. How can i deal with it

benvc · Accepted Answer · 2019-02-26 18:47:05Z

0

Based on the SyntaxError mentioned in your comments, you may be having a testing issue when attempting to print due to the fact that stdout is set to ascii in your console (and you may also find that your console does not support some of the characters you may be trying to print). You can try something like the following to set sys.stdout to utf-8 and see what your console will print (just using string slice and encode below to get bytes rather than the ast.literal_eval approach that has already been suggested):

import codecs
import sys

sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)

s = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
b = s[2:-1].encode().decode('utf-8')

answered Feb 26, 2019 at 18:47

benvc

15.3k4 gold badges39 silver badges57 bronze badges

2 Comments

Sujil Devkota Over a year ago

AttributeError: 'OutStream' object has no attribute 'buffer'

benvc Over a year ago

@SujilDevkota - unfortunately, I can't replicate that error. There must be some other environmental factors (i.e. additional code that is not included in the question, some sort of OS / shell combination that we aren't expecting, etc).

Sujil Devkota · Accepted Answer · 2019-02-28 13:19:22Z

Finally I have found an answer where i use a function to cast a string to bytes without encoding.Given string

str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"

now i take only actual encoded text inside of it

str1[2:-1]

and pass this to the function which convert the string to bytes without encoding its values

import struct
def rawbytes(s):
    """Convert a string to raw bytes without encoding"""
    outlist = []
    for cp in s:
        num = ord(cp)
        if num < 255:
            outlist.append(struct.pack('B', num))
        elif num < 65535:
            outlist.append(struct.pack('>H', num))
        else:
            b = (num & 0xFF0000) >> 16
            H = num & 0xFFFF
            outlist.append(struct.pack('>bH', b, H))
    return b''.join(outlist)

So, calling the function would convert it to bytes which then is decoded

rawbytes(str1[2:-1]).decode('utf-8')

will give the correct output

'Output file 문항분석.xlsx Created'

Collectives™ on Stack Overflow

How to decode a string representation of a bytes object?

4 Answers 4

6 Comments

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related