Porting to Python 3: string/bytes formatting

Question

I am porting a program from Python 2 to Python 3. I am having difficulties dealing with % (interpolation) operator, when values are bytes.

Suppose we need to port this expression from Python 2: '%s: %s\r\n' % (name, value).

name and value in the ported version of the program are of type bytes. The result should be of type bytes too. In Python 3 binary interpolation is only planned for Python 3.5 (PEP 460). So, not sure if I am correct, but there are only two ways to deal with this problem -- concatenation or string encoding/decoding where appropriate:

>>> name = b'Host'
>>> value = b'example.com'
>>> # Decode bytes and encode resulting string.
>>> ('%s: %s\r\n' % (name.decode('ascii'), value.decode('ascii'))).encode('ascii')
b'Host: example.com\r\n'
>>> # ... or just use concatenation.
>>> name + b': ' + value + b'\r\n'
b'Host: example.com\r\n'

As for me, both of these solutions are a bit ugly. Is there some convention/recommendation about how to port string formatting, when values are bytes?

Note 2to3 tool shouldn't be used and the program should work under both Python 2 and 3.

Have you tried using format string method instead? E.g. '{0}: {1}\r\n'.format(name, value)` — J0HN
– J0HN, Commented Jun 2, 2014 at 8:29
objects of type bytes don't have format method. So you still need always to encode/decode. — Maxim
– Maxim, Commented Jun 2, 2014 at 8:31
@J0HN But that won't produce the desired output either, e.g. str(name) will be "b'Host'". — Lev Levitsky
– Lev Levitsky, Commented Jun 2, 2014 at 8:40

Antti Haapala · Accepted Answer · 2015-05-20 09:24:02Z

2

For CPython I made the bttf library, that I will add some porting features; currently it supports monkeypatching the 3.5 bytes formatting code into Python 3.3 ad 3.4:

Thus, before you'd have:

>>> b'I am bytes format: %s, %08d' % (b'asdf', 42)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'tuple'

and with bttf:

>>> from bttf import install
>>> install('bytes_mod')
>>> b'I am bytes format: %s, %08d' % (b'asdf', 42)
b'I am bytes format: asdf, 00000042'

Unlike __future__s, the patching is interpreter-wide.

answered May 20, 2015 at 9:24

Antti Haapala

135k23 gold badges298 silver badges349 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Lev Levitsky · Accepted Answer · 2014-06-02 08:44:09Z

1

The decoding-formatting-encoding solution may seem ugly in this particular case, but it is apparently idiomatic.

The idea is that you only operate on Unicode strings internally, and do decoding/encoding when receiving/sending data. The approach is referred to as "Unicode sandwich" in Ned Batchelder's "Pragmatic Unicode".

Also, depending on the context, you might want to just change the fact that name and value are bytes objects.

edited Jun 2, 2014 at 8:44

answered Jun 2, 2014 at 8:36

Lev Levitsky

66.4k23 gold badges155 silver badges184 bronze badges

Collectives™ on Stack Overflow

Porting to Python 3: string/bytes formatting

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related