3

I am porting a program from Python 2 to Python 3. I am having difficulties dealing with % (interpolation) operator, when values are bytes.

Suppose we need to port this expression from Python 2: '%s: %s\r\n' % (name, value).

name and value in the ported version of the program are of type bytes. The result should be of type bytes too. In Python 3 binary interpolation is only planned for Python 3.5 (PEP 460). So, not sure if I am correct, but there are only two ways to deal with this problem -- concatenation or string encoding/decoding where appropriate:

>>> name = b'Host'
>>> value = b'example.com'
>>> # Decode bytes and encode resulting string.
>>> ('%s: %s\r\n' % (name.decode('ascii'), value.decode('ascii'))).encode('ascii')
b'Host: example.com\r\n'
>>> # ... or just use concatenation.
>>> name + b': ' + value + b'\r\n'
b'Host: example.com\r\n'

As for me, both of these solutions are a bit ugly. Is there some convention/recommendation about how to port string formatting, when values are bytes?

Note 2to3 tool shouldn't be used and the program should work under both Python 2 and 3.

5
  • Have you tried using format string method instead? E.g. '{0}: {1}\r\n'.format(name, value)` Commented Jun 2, 2014 at 8:29
  • objects of type bytes don't have format method. So you still need always to encode/decode. Commented Jun 2, 2014 at 8:31
  • '{0}: {1}\r\n' is not a bytes, isn't it? Commented Jun 2, 2014 at 8:38
  • @J0HN But that won't produce the desired output either, e.g. str(name) will be "b'Host'". Commented Jun 2, 2014 at 8:40
  • I wasn't sure, that's why it was a comment, not answer. Commented Jun 2, 2014 at 8:57

2 Answers 2

2

For CPython I made the bttf library, that I will add some porting features; currently it supports monkeypatching the 3.5 bytes formatting code into Python 3.3 ad 3.4:

Thus, before you'd have:

>>> b'I am bytes format: %s, %08d' % (b'asdf', 42)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'tuple'

and with bttf:

>>> from bttf import install
>>> install('bytes_mod')
>>> b'I am bytes format: %s, %08d' % (b'asdf', 42)
b'I am bytes format: asdf, 00000042'

Unlike __future__s, the patching is interpreter-wide.

Sign up to request clarification or add additional context in comments.

Comments

1

The decoding-formatting-encoding solution may seem ugly in this particular case, but it is apparently idiomatic.

The idea is that you only operate on Unicode strings internally, and do decoding/encoding when receiving/sending data. The approach is referred to as "Unicode sandwich" in Ned Batchelder's "Pragmatic Unicode".

Also, depending on the context, you might want to just change the fact that name and value are bytes objects.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.