14

This is turning out to be trickier than I expected. I have a byte string:

data = b'abcdefghijklmnopqrstuvwxyz'

I want to read this data in chunks of n bytes. Under Python 2, this is trivial using a minor modification to the grouper recipe from the itertools documentation:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return (''.join(x) for x in izip_longest(fillvalue=fillvalue, *args))

With this in place, I can call:

>>> list(grouper(data, 2))

And get:

['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']

Under Python 3, this gets trickier. The grouper function as written simply falls over:

>>> list(grouper(data, 2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in <genexpr>
TypeError: sequence item 0: expected str instance, int found

And this is because in Python 3, when you iterate over a bytestring (like b'foo'), you get a list of integers, rather than a list of bytes:

>>> list(b'foo')
[102, 111, 111]

The python 3 bytes function will help out here:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return (bytes(x) for x in izip_longest(fillvalue=fillvalue, *args))

Using that, I get what I want:

>>> list(grouper(data, 2))
[b'ab', b'cd', b'ef', b'gh', b'ij', b'kl', b'mn', b'op', b'qr', b'st', b'uv', b'wx', b'yz']

But (of course!) the bytes function under Python 2 does not behave the same way. It's just an alias for str, so that results in:

>>> list(grouper(data, 2))
["('a', 'b')", "('c', 'd')", "('e', 'f')", "('g', 'h')", "('i', 'j')", "('k', 'l')", "('m', 'n')", "('o', 'p')", "('q', 'r')", "('s', 't')", "('u', 'v')", "('w', 'x')", "('y', 'z')"]

...which is not at all helpful. I ended up writing the following:

def to_bytes(s):
    if six.PY3:
        return bytes(s)
    else:
        return ''.encode('utf-8').join(list(s))

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return (to_bytes(x) for x in izip_longest(fillvalue=fillvalue, *args))

This seems to work, but is this really the way to do it?

1
  • @AnttiHaapala, thanks for that pointer. Commented Feb 23, 2016 at 15:54

2 Answers 2

11

Funcy (a library offering various useful utilities, supporting both Python 2 and 3) offers a chunks function that does exactly this:

>>> import funcy
>>> data = b'abcdefghijklmnopqrstuvwxyz'
>>> list(funcy.chunks(6, data))
[b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz']   # Python 3
['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz']        # Python 2.7

Alternatively, you could include a simple implementation of this in your program (compatible with both Python 2.7 and 3):

def chunked(size, source):
    for i in range(0, len(source), size):
        yield source[i:i+size]

It behaves the same (at least for your data; Funcy's chunks also works with iterators, this doesn't):

>>> list(chunked(6, data))
[b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz']   # Python 3
['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz']        # Python 2.7
Sign up to request clarification or add additional context in comments.

2 Comments

Ok, but source can't be iterable in your function chunked
@vlk I know, I explicitly mention this in my answer. But the question wasn't about iterators, but about bytestrings. If you need to use iterators, consider the funcy solution.
3

Using bytes with bytearray would work for both if your string length was divisible by n or you pass a non empty string as the fillvalue:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return ((bytes(bytearray(x))) for x in zip_longest(fillvalue=fillvalue, *args))

py3:

In [20]: import sys

In [21]: sys.version
Out[21]: '3.4.3 (default, Oct 14 2015, 20:28:29) \n[GCC 4.8.4]'

In [22]: print(list(grouper(data,2)))
[b'ab', b'cd', b'ef', b'gh', b'ij', b'kl', b'mn', b'op', b'qr', b'st', b'uv', b'wx', b'yz']

Py2:

In [6]: import sys

In [7]: sys.version
Out[7]: '2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]'

In [8]: print(list(grouper(data,2)))
['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']

If you passed an empty string you could filter them out:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return ((bytes(bytearray(filter(None, x)))) for x in zip_longest(fillvalue=fillvalue, *args))

Which will work for any length string.

In [29]: print(list(grouper(data,4)))
[b'abcd', b'efgh', b'ijkl', b'mnop', b'qrst', b'uvwx', b'yz']

In [30]: print(list(grouper(data,3)))
[b'abc', b'def', b'ghi', b'jkl', b'mno', b'pqr', b'stu', b'vwx', b'yz']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.