Using strings and byte-like objects compatibly in code to run in both Python 2 & 3

Question

I'm trying to modify the code shown far below, which works in Python 2.7.x, so it will also work unchanged in Python 3.x. However I'm encountering the following problem I can't solve in the first function, bin_to_float() as shown by the output below:

float_to_bin(0.000000): '0'
Traceback (most recent call last):
  File "binary-to-a-float-number.py", line 36, in <module>
    float = bin_to_float(binary)
  File "binary-to-a-float-number.py", line 9, in bin_to_float
    return struct.unpack('>d', bf)[0]
TypeError: a bytes-like object is required, not 'str'

I tried to fix that by adding a bf = bytes(bf) right before the call to struct.unpack(), but doing so produced its own TypeError:

TypeError: string argument without an encoding

So my questions are is it possible to fix this issue and achieve my goal? And if so, how? Preferably in a way that would work in both versions of Python.

Here's the code that works in Python 2:

import struct

def bin_to_float(b):
    """ Convert binary string to a float. """
    bf = int_to_bytes(int(b, 2), 8)  # 8 bytes needed for IEEE 754 binary64
    return struct.unpack('>d', bf)[0]

def int_to_bytes(n, minlen=0):  # helper function
    """ Int/long to byte string. """
    nbits = n.bit_length() + (1 if n < 0 else 0)  # plus one for any sign bit
    nbytes = (nbits+7) // 8  # number of whole bytes
    bytes = []
    for _ in range(nbytes):
        bytes.append(chr(n & 0xff))
        n >>= 8
    if minlen > 0 and len(bytes) < minlen:  # zero pad?
        bytes.extend((minlen-len(bytes)) * '0')
    return ''.join(reversed(bytes))  # high bytes at beginning

# tests

def float_to_bin(f):
    """ Convert a float into a binary string. """
    ba = struct.pack('>d', f)
    ba = bytearray(ba)
    s = ''.join('{:08b}'.format(b) for b in ba)
    s = s.lstrip('0')  # strip leading zeros
    return s if s else '0'  # but leave at least one

for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
    binary = float_to_bin(f)
    print('float_to_bin(%f): %r' % (f, binary))
    float = bin_to_float(binary)
    print('bin_to_float(%r): %f' % (binary, float))
    print('')

metatoaster · Accepted Answer · 2016-08-31 15:03:45Z

To make portable code that works with bytes in both Python 2 and 3 using libraries that literally use the different data types between the two, you need to explicitly declare them using the appropriate literal mark for every string (or add from __future__ import unicode_literals to top of every module doing this). This step is to ensure your data types are correct internally in your code.

Secondly, make the decision to support Python 3 going forward, with fallbacks specific for Python 2. This means overriding str with unicode, and figure out methods/functions that do not return the same types in both Python versions should be modified and replaced to return the correct type (being the Python 3 version). Do note that bytes is a reserved word, too, so don't use that.

Putting this together, your code will look something like this:

import struct
import sys

if sys.version_info < (3, 0):
    str = unicode
    chr = unichr


def bin_to_float(b):
    """ Convert binary string to a float. """
    bf = int_to_bytes(int(b, 2), 8)  # 8 bytes needed for IEEE 754 binary64
    return struct.unpack(b'>d', bf)[0]

def int_to_bytes(n, minlen=0):  # helper function
    """ Int/long to byte string. """
    nbits = n.bit_length() + (1 if n < 0 else 0)  # plus one for any sign bit
    nbytes = (nbits+7) // 8  # number of whole bytes
    ba = bytearray(b'')
    for _ in range(nbytes):
        ba.append(n & 0xff)
        n >>= 8
    if minlen > 0 and len(ba) < minlen:  # zero pad?
        ba.extend((minlen-len(ba)) * b'0')
    return u''.join(str(chr(b)) for b in reversed(ba)).encode('latin1')  # high bytes at beginning

# tests

def float_to_bin(f):
    """ Convert a float into a binary string. """
    ba = struct.pack(b'>d', f)
    ba = bytearray(ba)
    s = u''.join(u'{:08b}'.format(b) for b in ba)
    s = s.lstrip(u'0')  # strip leading zeros
    return (s if s else u'0').encode('latin1')  # but leave at least one

for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
    binary = float_to_bin(f)
    print(u'float_to_bin(%f): %r' % (f, binary))
    float = bin_to_float(binary)
    print(u'bin_to_float(%r): %f' % (binary, float))
    print(u'')

I used the latin1 codec simply because that's what the byte mappings are originally defined, and it seems to work

$ python2 foo.py 
float_to_bin(0.000000): '0'
bin_to_float('0'): 0.000000

float_to_bin(1.000000): '11111111110000000000000000000000000000000000000000000000000000'
bin_to_float('11111111110000000000000000000000000000000000000000000000000000'): 1.000000

float_to_bin(-14.000000): '1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float('1100000000101100000000000000000000000000000000000000000000000000'): -14.000000

float_to_bin(12.546000): '100000000101001000101111000110101001111110111110011101101100100'
bin_to_float('100000000101001000101111000110101001111110111110011101101100100'): 12.546000

float_to_bin(3.141593): '100000000001001001000011111101110000010110000101011110101111111'
bin_to_float('100000000001001001000011111101110000010110000101011110101111111'): 3.141593

Again, but this time under Python 3.5)

$ python3 foo.py 
float_to_bin(0.000000): b'0'
bin_to_float(b'0'): 0.000000

float_to_bin(1.000000): b'11111111110000000000000000000000000000000000000000000000000000'
bin_to_float(b'11111111110000000000000000000000000000000000000000000000000000'): 1.000000

float_to_bin(-14.000000): b'1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float(b'1100000000101100000000000000000000000000000000000000000000000000'): -14.000000

float_to_bin(12.546000): b'100000000101001000101111000110101001111110111110011101101100100'
bin_to_float(b'100000000101001000101111000110101001111110111110011101101100100'): 12.546000

float_to_bin(3.141593): b'100000000001001001000011111101110000010110000101011110101111111'
bin_to_float(b'100000000001001001000011111101110000010110000101011110101111111'): 3.141593

It's a lot more work, but in Python3 you can more clearly see that the types are done as proper bytes. I also changed your bytes = [] to a bytearray to more clearly express what you were trying to do.

Your approach is probably the best for backporting Python 3 code to Python 2, but it's the opposite in my case. To be fair, I didn't mention this in my question. While initially trying to fix things myself, I, too, noticed the use of bytes as a variable name. It became the name of a built-in in Python 2.6—the original code's nearly 5 years old so that's probably why. Anyway, mostly for that reason that it requires the least amount of modification, I'm going to accept @smarx's answer. Nonetheless, I learned a number of new things from yours which I'll try to apply in future work. Thanks.
@martineau And thank you for letting me know what you thought, and I definitely agree with everything you have said and assessed. Glad to have given you some new insights to how to support Python 2 and 3 better.

user94559 · Accepted Answer · 2016-08-31 15:06:02Z

1

I had a different approach from @metatoaster's answer. I just modified int_to_bytes to use and return a bytearray:

def int_to_bytes(n, minlen=0):  # helper function
    """ Int/long to byte string. """
    nbits = n.bit_length() + (1 if n < 0 else 0)  # plus one for any sign bit
    nbytes = (nbits+7) // 8  # number of whole bytes
    b = bytearray()
    for _ in range(nbytes):
        b.append(n & 0xff)
        n >>= 8
    if minlen > 0 and len(b) < minlen:  # zero pad?
        b.extend([0] * (minlen-len(b)))
    return bytearray(reversed(b))  # high bytes at beginning

This seems to work without any other modifications under both Python 2.7.11 and Python 3.5.1.

Note that I zero padded with 0 instead of '0'. I didn't do much testing, but surely that's what you meant?

answered Aug 31, 2016 at 15:06

user94559

60.3k6 gold badges108 silver badges107 bronze badges

2 Comments

metatoaster Over a year ago

Nice, I somehow missed that. Really am up a bit too late and this one somehow nerdsniped me, but yeah I am just very pedantic when it comes to ensuring compatibility between two versions in a generic enough way.

martineau Over a year ago

Your answer was what I was hoping to find. You were also correct about the zero padding, It should have been '\x00', not '0'. Nice catch—obviously my own testing was weak and missed a code path.

Sven Marnach · Accepted Answer · 2016-08-31 15:21:59Z

1

In Python 3, integers have a to_bytes() method that can perform the conversion in a single call. However, since you asked for a solution that works on Python 2 and 3 unmodified, here's an alternative approach.

If you take a detour via hexadecimal representation, the function int_to_bytes() becomes very simple:

import codecs

def int_to_bytes(n, minlen=0):
    hex_str = format(n, "0{}x".format(2 * minlen))
    return codecs.decode(hex_str, "hex")

You might need some special case handling to deal with the case when the hex string gets an odd number of characters.

Note that I'm not sure this works with all versions of Python 3. I remember that pseudo-encodings weren't supported in some 3.x version, but I don't remember the details. I tested the code with Python 3.5.

answered Aug 31, 2016 at 15:21

Sven Marnach

608k123 gold badges968 silver badges865 bronze badges

3 Comments

martineau Over a year ago

Glad you reminded me about the addition of to_bytes() and I likely would have never thought of the inventive workaround for portability (which also handles minlen). However, even though it significantly less code, it's a little too clever IMO, so I'm going to accept one of the other answers.

martineau Over a year ago

FWIW: The intermediate hex conversion trick may be more commonly known than I thought, after reading some of the answers (and linked discussions in them) to the question Has Python 3.2 to_bytes been back-ported to python 2.7?.

Sven Marnach Over a year ago

@martineau See also stackoverflow.com/questions/4358285/…

Collectives™ on Stack Overflow

Using strings and byte-like objects compatibly in code to run in both Python 2 & 3

3 Answers 3

2 Comments

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related