4

I've got a chunk of code that reads binary data off a string buffer (StringIO object), and tries to convert it to a bytearray object, but it's throwing errors when the value is greater than 127, which the ascii encoding can't handle, even when I'm trying to override it:

file = open(filename, 'r+b')
file.seek(offset)
chunk = file.read(length)
chunk = zlib.decompress(chunk)
chunk = StringIO(chunk)

d = bytearray(chunk.read(10), encoding="iso8859-1", errors="replace")

Running that code gives me:

  d = bytearray(chunk.read(10), encoding="iso8859-1", errors="replace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 3: ordinal not in range(128)

Obviously 240 (decimal of 0xf0) can't fit in the ascii encoding range, but that's why I'm explicitly setting the encoding. But it seems to be ignoring it.

2 Answers 2

9

When converting a string to another encoding, its original encoding is taken to be ASCII if it is a str or Unicode if it is a unicode object. When creating the bytearray, the encoding parameter is required only if the string is unicode. Just don't specify an encoding and you will get the results you want.

Sign up to request clarification or add additional context in comments.

2 Comments

If that just was documented properly (instead of just saying "If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode()." for both Python 2.7 and 3)..
er; nitpick. "unicode" is not an encoding, and unicode is not encoded; the way I would phrase it is that when converting to bytes, you must have a unicode object to encode, if you use a str, it's first decodeed from ascii, into a 'temporary' unicode object, and that is then encoded as requested.
2

I am not quite sure what the problem is.

StringIO is for string IO, not for binary IO. If you want to get a bytearray representing the whole content of the file, use:

with open ('filename', 'r') as file: bytes = bytearray (file.read () )

if you want to get a string with only ascii characters contained in that file, use:

with open ('filename', 'r') as file: asciis = file.read ().decode ('ascii', 'ignore')

(If you run it on windows, you will probably need the binary flag for opening the file.

2 Comments

file.read() also returns a string, so this isn't the problem.
It works for me to read a file with arbitrary binary data and print bytes yields all the bytes, no matter if < or > 127.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.