How to define a binary string in Python in a way that works with both py2 and py3?

Question

I am writing a module that is supposed to work in both Python 2 and 3 and I need to define a binary string.

Usually this would be something like data = b'abc' but this code code fails on Python 2.5 with invalid syntax.

How can I write the above code in a way that will work in all versions of Python 2.5+

Note: this has to be binary (it can contain any kind of characters, 0xFF), this is very important.

The b"abc" syntax and the bytes() constructor were added in Python 2.6. — Tim Pietzcker
– Tim Pietzcker, Commented Oct 13, 2011 at 13:53
When googling for python 2 and python 3 in various ways of googling for this, both the six library, and my book, which has essentially similar working solutions for this, will appear on the first page of the search results. Yet, nobody seems to know either of them exists. How can we fix that? Spread the word! — Lennart Regebro
– Lennart Regebro, Commented Oct 13, 2011 at 20:29

Lennart Regebro · Accepted Answer · 2011-10-13 20:30:27Z

6

I would recommend the following:

from six import b

That requires the six module, of course. If you don't want that, here's another version:

import sys
if sys.version < '3':
    def b(x):
        return x
else:
    import codecs
    def b(x):
        return codecs.latin_1_encode(x)[0]

More info.

These solutions (essentially the same) work, are clean, as fast as you are going to get, and can support all 256 byte values (which none of the other solutions here can).

edited Oct 13, 2011 at 20:30

answered Oct 13, 2011 at 20:16

Lennart Regebro

173k45 gold badges230 silver badges254 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Petr Viktorin · Accepted Answer · 2011-10-13 13:57:40Z

2

If the string only has ASCII characters, call encode. This will give you a str in Python 2 (just like b'abc'), and a bytes in Python 3:

'abc'.encode('ascii')

If not, rather than putting binary data in the source, create a data file, open it with 'rb' and read from it.

answered Oct 13, 2011 at 13:57

Petr Viktorin

67.3k9 gold badges85 silver badges83 bronze badges

7 Comments

sorin Over a year ago

As you suspected I do have several very small binary blocks, so using files for storing them is not an option. And yes they have non-ascii values.

Petr Viktorin Over a year ago

So, what do the strings actually look like? If they're human-readable strings, decode them with the proper encoding. If not, then use base64.

Lennart Regebro Over a year ago

Create a file and read from it? Complicated solution for a simple problem. Sorry, -1.

Lennart Regebro Over a year ago

(And using ascii is limiting without reason, use latin1 instead).

Petr Viktorin Over a year ago

@LennartRegebro: That wouldn't work in Python 2; try '\xff'.encode('latin1').

|

glglgl · Accepted Answer · 2011-10-14 09:11:46Z

-3

You could store the data base64-encoded.

First step would be to transform into base64:

>>> import base64
>>> base64.b64encode(b"\x80\xFF")
b'gP8='

This is to be done once, and using the b or not depends on the version of Python you use for it.

In the second step, you put this byte string into a program without the b. Then it is ensured that it works in py2 and py3.

import base64
x = 'gP8='
base64.b64decode(x.encode("latin1"))

gives you a str '\x80\xff' in 2.6 (should work in 2.5 as well) and a b'\x80\xff'in 3.x.

Alternatively to the two steps above, you can do the same with hex data, you can do

import binascii
x = '80FF'
binascii.unhexlify(x) # `bytes()` in 3.x, `str()` in 2.x

edited Oct 14, 2011 at 9:11

answered Oct 13, 2011 at 14:11

glglgl

91.5k13 gold badges157 silver badges230 bronze badges

8 Comments

sorin Over a year ago

Oops, the code is going to be quite cryptic. Cant we find a solution that will work with hex.

sorin Over a year ago

Have you tried the code in Python3 ? binascii.unhexlify(x) gives TypeError: 'str' does not support the buffer interface

Lennart Regebro Over a year ago

I don't understand what the base64 part is supposed to do. You can remove it and it will still work.

glglgl Over a year ago

@sorin: strange... here it works fine in Python 3.1 (r31:73572, Jul 5 2010, 13:15:03). Maybe x.encode("latin1") works better here as well...

glglgl Over a year ago

@Lennart Regebro It is supposed to be an alternative, as hex was preferred. b'\x80\xff' gets encoded to 'gP8=' in base64 and to '80FF' in hex.

|

Collectives™ on Stack Overflow

How to define a binary string in Python in a way that works with both py2 and py3?

3 Answers 3

Comments

7 Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

7 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related