3

I'm trying to modify the code shown far below, which works in Python 2.7.x, so it will also work unchanged in Python 3.x. However I'm encountering the following problem I can't solve in the first function, bin_to_float() as shown by the output below:

float_to_bin(0.000000): '0'
Traceback (most recent call last):
  File "binary-to-a-float-number.py", line 36, in <module>
    float = bin_to_float(binary)
  File "binary-to-a-float-number.py", line 9, in bin_to_float
    return struct.unpack('>d', bf)[0]
TypeError: a bytes-like object is required, not 'str'

I tried to fix that by adding a bf = bytes(bf) right before the call to struct.unpack(), but doing so produced its own TypeError:

TypeError: string argument without an encoding

So my questions are is it possible to fix this issue and achieve my goal? And if so, how? Preferably in a way that would work in both versions of Python.

Here's the code that works in Python 2:

import struct

def bin_to_float(b):
    """ Convert binary string to a float. """
    bf = int_to_bytes(int(b, 2), 8)  # 8 bytes needed for IEEE 754 binary64
    return struct.unpack('>d', bf)[0]

def int_to_bytes(n, minlen=0):  # helper function
    """ Int/long to byte string. """
    nbits = n.bit_length() + (1 if n < 0 else 0)  # plus one for any sign bit
    nbytes = (nbits+7) // 8  # number of whole bytes
    bytes = []
    for _ in range(nbytes):
        bytes.append(chr(n & 0xff))
        n >>= 8
    if minlen > 0 and len(bytes) < minlen:  # zero pad?
        bytes.extend((minlen-len(bytes)) * '0')
    return ''.join(reversed(bytes))  # high bytes at beginning

# tests

def float_to_bin(f):
    """ Convert a float into a binary string. """
    ba = struct.pack('>d', f)
    ba = bytearray(ba)
    s = ''.join('{:08b}'.format(b) for b in ba)
    s = s.lstrip('0')  # strip leading zeros
    return s if s else '0'  # but leave at least one

for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
    binary = float_to_bin(f)
    print('float_to_bin(%f): %r' % (f, binary))
    float = bin_to_float(binary)
    print('bin_to_float(%r): %f' % (binary, float))
    print('')
0

3 Answers 3

3

To make portable code that works with bytes in both Python 2 and 3 using libraries that literally use the different data types between the two, you need to explicitly declare them using the appropriate literal mark for every string (or add from __future__ import unicode_literals to top of every module doing this). This step is to ensure your data types are correct internally in your code.

Secondly, make the decision to support Python 3 going forward, with fallbacks specific for Python 2. This means overriding str with unicode, and figure out methods/functions that do not return the same types in both Python versions should be modified and replaced to return the correct type (being the Python 3 version). Do note that bytes is a reserved word, too, so don't use that.

Putting this together, your code will look something like this:

import struct
import sys

if sys.version_info < (3, 0):
    str = unicode
    chr = unichr


def bin_to_float(b):
    """ Convert binary string to a float. """
    bf = int_to_bytes(int(b, 2), 8)  # 8 bytes needed for IEEE 754 binary64
    return struct.unpack(b'>d', bf)[0]

def int_to_bytes(n, minlen=0):  # helper function
    """ Int/long to byte string. """
    nbits = n.bit_length() + (1 if n < 0 else 0)  # plus one for any sign bit
    nbytes = (nbits+7) // 8  # number of whole bytes
    ba = bytearray(b'')
    for _ in range(nbytes):
        ba.append(n & 0xff)
        n >>= 8
    if minlen > 0 and len(ba) < minlen:  # zero pad?
        ba.extend((minlen-len(ba)) * b'0')
    return u''.join(str(chr(b)) for b in reversed(ba)).encode('latin1')  # high bytes at beginning

# tests

def float_to_bin(f):
    """ Convert a float into a binary string. """
    ba = struct.pack(b'>d', f)
    ba = bytearray(ba)
    s = u''.join(u'{:08b}'.format(b) for b in ba)
    s = s.lstrip(u'0')  # strip leading zeros
    return (s if s else u'0').encode('latin1')  # but leave at least one

for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
    binary = float_to_bin(f)
    print(u'float_to_bin(%f): %r' % (f, binary))
    float = bin_to_float(binary)
    print(u'bin_to_float(%r): %f' % (binary, float))
    print(u'')

I used the latin1 codec simply because that's what the byte mappings are originally defined, and it seems to work

$ python2 foo.py 
float_to_bin(0.000000): '0'
bin_to_float('0'): 0.000000

float_to_bin(1.000000): '11111111110000000000000000000000000000000000000000000000000000'
bin_to_float('11111111110000000000000000000000000000000000000000000000000000'): 1.000000

float_to_bin(-14.000000): '1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float('1100000000101100000000000000000000000000000000000000000000000000'): -14.000000

float_to_bin(12.546000): '100000000101001000101111000110101001111110111110011101101100100'
bin_to_float('100000000101001000101111000110101001111110111110011101101100100'): 12.546000

float_to_bin(3.141593): '100000000001001001000011111101110000010110000101011110101111111'
bin_to_float('100000000001001001000011111101110000010110000101011110101111111'): 3.141593

Again, but this time under Python 3.5)

$ python3 foo.py 
float_to_bin(0.000000): b'0'
bin_to_float(b'0'): 0.000000

float_to_bin(1.000000): b'11111111110000000000000000000000000000000000000000000000000000'
bin_to_float(b'11111111110000000000000000000000000000000000000000000000000000'): 1.000000

float_to_bin(-14.000000): b'1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float(b'1100000000101100000000000000000000000000000000000000000000000000'): -14.000000

float_to_bin(12.546000): b'100000000101001000101111000110101001111110111110011101101100100'
bin_to_float(b'100000000101001000101111000110101001111110111110011101101100100'): 12.546000

float_to_bin(3.141593): b'100000000001001001000011111101110000010110000101011110101111111'
bin_to_float(b'100000000001001001000011111101110000010110000101011110101111111'): 3.141593

It's a lot more work, but in Python3 you can more clearly see that the types are done as proper bytes. I also changed your bytes = [] to a bytearray to more clearly express what you were trying to do.

Sign up to request clarification or add additional context in comments.

2 Comments

Your approach is probably the best for backporting Python 3 code to Python 2, but it's the opposite in my case. To be fair, I didn't mention this in my question. While initially trying to fix things myself, I, too, noticed the use of bytes as a variable name. It became the name of a built-in in Python 2.6—the original code's nearly 5 years old so that's probably why. Anyway, mostly for that reason that it requires the least amount of modification, I'm going to accept @smarx's answer. Nonetheless, I learned a number of new things from yours which I'll try to apply in future work. Thanks.
@martineau And thank you for letting me know what you thought, and I definitely agree with everything you have said and assessed. Glad to have given you some new insights to how to support Python 2 and 3 better.
1

I had a different approach from @metatoaster's answer. I just modified int_to_bytes to use and return a bytearray:

def int_to_bytes(n, minlen=0):  # helper function
    """ Int/long to byte string. """
    nbits = n.bit_length() + (1 if n < 0 else 0)  # plus one for any sign bit
    nbytes = (nbits+7) // 8  # number of whole bytes
    b = bytearray()
    for _ in range(nbytes):
        b.append(n & 0xff)
        n >>= 8
    if minlen > 0 and len(b) < minlen:  # zero pad?
        b.extend([0] * (minlen-len(b)))
    return bytearray(reversed(b))  # high bytes at beginning

This seems to work without any other modifications under both Python 2.7.11 and Python 3.5.1.

Note that I zero padded with 0 instead of '0'. I didn't do much testing, but surely that's what you meant?

2 Comments

Nice, I somehow missed that. Really am up a bit too late and this one somehow nerdsniped me, but yeah I am just very pedantic when it comes to ensuring compatibility between two versions in a generic enough way.
Your answer was what I was hoping to find. You were also correct about the zero padding, It should have been '\x00', not '0'. Nice catch—obviously my own testing was weak and missed a code path.
1

In Python 3, integers have a to_bytes() method that can perform the conversion in a single call. However, since you asked for a solution that works on Python 2 and 3 unmodified, here's an alternative approach.

If you take a detour via hexadecimal representation, the function int_to_bytes() becomes very simple:

import codecs

def int_to_bytes(n, minlen=0):
    hex_str = format(n, "0{}x".format(2 * minlen))
    return codecs.decode(hex_str, "hex")

You might need some special case handling to deal with the case when the hex string gets an odd number of characters.

Note that I'm not sure this works with all versions of Python 3. I remember that pseudo-encodings weren't supported in some 3.x version, but I don't remember the details. I tested the code with Python 3.5.

3 Comments

Glad you reminded me about the addition of to_bytes() and I likely would have never thought of the inventive workaround for portability (which also handles minlen). However, even though it significantly less code, it's a little too clever IMO, so I'm going to accept one of the other answers.
FWIW: The intermediate hex conversion trick may be more commonly known than I thought, after reading some of the answers (and linked discussions in them) to the question Has Python 3.2 to_bytes been back-ported to python 2.7?.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.