3

I have a string which includes encoded bytes inside it:

str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"

I want to decode it, but I can't since it has become a string. Therefore I want to ask whether there is any way I can convert it into

str2 = b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'

Here str2 is a bytes object which I can decode easily using

str2.decode('utf-8')

to get the final result:

'Output file 문항분석.xlsx Created'

4 Answers 4

1

You could use ast.literal_eval:

>>> print(str1)
b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
>>> type(str1)
<class 'str'>

>>> from ast import literal_eval
>>> literal_eval(str1).decode('utf-8')
'Output file 문항분석.xlsx Created'
Sign up to request clarification or add additional context in comments.

6 Comments

SyntaxError: bytes can only contain ASCII literal characters.
That suggests your input is actually something like "b'Output file 문í\x95\xadë¶\x84ì\x84\x9d.xlsx Created'", rather than the escaped string I inferred from your question. I think at that point it's time to go and fix whatever's sending you such messed up input, to be honest …
Please see my previous comment – once things are as messed up as that, you're better off fixing whatever is generating such an unpleasant string.
I myself has converted it to bytes msg = bytes("Output file " + output_filename + " Created", 'utf-8') print(msg) To send the msg through popen().communicate() function which doesn't support the original msg after i receive after communicate i get the list of bytes string like above
|
0

A simple way is to assume that all the characters of the initial strings are in the [0,256) range and map to the same Unicode value, which means that it is a Latin1 encoded string.

The conversion is then trivial:

str1[2:-1].encode('Latin1').decode('utf8')

2 Comments

Thank you, it is very short and way more easier solution then i found
But one thing is that it works fine when i run this code separately but in my main program when i implement it, the string after Latin1 encoding will automatically add \ in-front of any \ so there is \\ in the string therefore decoding it makes it just remove the single slash.So, the above code becomes similar to str1[2:-1]. I think in order to escape \ character python is adding another \. How can i deal with it
0

Based on the SyntaxError mentioned in your comments, you may be having a testing issue when attempting to print due to the fact that stdout is set to ascii in your console (and you may also find that your console does not support some of the characters you may be trying to print). You can try something like the following to set sys.stdout to utf-8 and see what your console will print (just using string slice and encode below to get bytes rather than the ast.literal_eval approach that has already been suggested):

import codecs
import sys

sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)

s = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
b = s[2:-1].encode().decode('utf-8')

2 Comments

AttributeError: 'OutStream' object has no attribute 'buffer'
@SujilDevkota - unfortunately, I can't replicate that error. There must be some other environmental factors (i.e. additional code that is not included in the question, some sort of OS / shell combination that we aren't expecting, etc).
0

Finally I have found an answer where i use a function to cast a string to bytes without encoding.Given string

str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"

now i take only actual encoded text inside of it

str1[2:-1]

and pass this to the function which convert the string to bytes without encoding its values

import struct
def rawbytes(s):
    """Convert a string to raw bytes without encoding"""
    outlist = []
    for cp in s:
        num = ord(cp)
        if num < 255:
            outlist.append(struct.pack('B', num))
        elif num < 65535:
            outlist.append(struct.pack('>H', num))
        else:
            b = (num & 0xFF0000) >> 16
            H = num & 0xFFFF
            outlist.append(struct.pack('>bH', b, H))
    return b''.join(outlist)

So, calling the function would convert it to bytes which then is decoded

rawbytes(str1[2:-1]).decode('utf-8')

will give the correct output

'Output file 문항분석.xlsx Created'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.