5

I have a list like so:

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

I want it to look like so

[['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']]

what's the most efficient way to do this?

edit: what about going the other way?

[['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']]

-->

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
4
  • just curious, how large your data set ? Commented Apr 8, 2014 at 16:55
  • 1
    I have a few data sets, some are small 5000 observations, some are 100k observations Commented Apr 9, 2014 at 1:38
  • i asked because i felt this as an interesting problem to look for efficiency; i am thinking to benchmark all the below solutions for the "fastest and the furious" one ... Commented Apr 9, 2014 at 11:43
  • @kmonsoor be my guest. Commented Apr 9, 2014 at 12:58

6 Answers 6

12

You can do what you want with a simple list comprehension.

>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> [a[i:i+3] for i in range(0, len(a), 3)]
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

If you want the last sub-list to be padded you can do this before the list comprehension:

>>> padding = 0
>>> a += [padding]*(3-len(a)%3)

Combining these together into a single function:

def group(sequence, group_length, padding=None):
    if padding is not None:
        sequence += [padding]*(group_length-len(sequence)%group_length)
    return [sequence[i:i+group_length] for i in range(0, len(sequence), group_length)]

Going the other way:

def flatten(sequence):
    return [item for sublist in sequence for item in sublist]

>>> a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> flatten(a)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Sign up to request clarification or add additional context in comments.

8 Comments

wow. so simple. doesn't seem so but any drawbacks or warnings to this code?
@jason_cant_code it doesn't pad the last sub-list; whether that is a drawback or a feature depends on your perspective, and what you do with it next!
@jonrsharpe I added a solution for that problem.
@Scorpion_God. What about going the other way? [['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']] --> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
@jason_cant_code try [item for sublist in mylist for item in sublist]
|
4

If you can use numpy, try x.reshape(-1, 3)

In [1]: import numpy as np
In [2]: x = ['a','b','c','d','e','f','g','h','i']
In [3]: x = np.array(x)
In [4]: x.reshape(-1, 3)
Out[4]: 
array([['a', 'b', 'c'],
       ['d', 'e', 'f'],
       ['g', 'h', 'i']], 
      dtype='|S1')

if data is big enough, this code is more efficient.

Update

appending cProfile results to explain more efficient

import cProfile
import numpy as np

a = range(10000000*3)

def impl_a():
    x = [a[i:i+3] for i in range(0, len(a), 3)]

def impl_b():
    x = np.array(a)
    x = x.reshape(-1, 3)

print("cProfile reuslt of impl_a()")
cProfile.run("impl_a()")
print("cProfile reuslt of impl_b()")
cProfile.run("impl_b()")

Output is

cProfile reuslt of impl_a()
      5 function calls in 15.614 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.499    0.499   15.614   15.614 <string>:1(<module>)
     1   14.968   14.968   15.114   15.114 impla.py:6(impl_a)
     1    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.146    0.146    0.146    0.146 {range}


cProfile reuslt of impl_b()
     5 function calls in 3.142 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    3.142    3.142 <string>:1(<module>)
     1    0.000    0.000    3.142    3.142 impla.py:9(impl_b)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1    3.142    3.142    3.142    3.142 {numpy.core.multiarray.array}

4 Comments

Interesting method. How big to be more efficient? and more efficient than which method?
Thanks for elaborating. Very good to know. I will use it later on for bigger data sets.
Is it more efficient in general, or just with int's? What about strings, objects or even mixed types..
@mskimm in my test, scorpion_god's list comprehension seems like 10% more efficient than yours
3

You can use the grouper recipe from itertools with a list comprehension:

from itertools import izip_longest # or zip_longest for Python 3.x

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args) # see note above

in_ = [1, 2, 3, 4, 5, 6, 7, 8, 9]

out = [list(t) for t in grouper(in_, 3)]

2 Comments

@kmonsoor potentially, if literally all you want is to split a 9-item list into threes. However, itertools makes it efficient for much longer lists (you don't have to build the list, you can just iterate through the groups) and provides for easily changing the length or padding the last triple (with None by default). grouper is only two lines of code, it's hardly complex.
these are all really good solutions presented. I guess it just depends on how I use them. I can see good use for all of these answers in different situations.
2

My solution:

>>> list=[1,2,3,4,5,6,7,8,9,10]
>>> map(lambda i: list[i:i+3], range(0,len(list),3))
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

Comments

1

Use itertools, more specifically, the function grouper mentioned unter Recipes:

from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print [list(x) for x in grouper(a, 3)]

This prints

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Comments

1

I have ran all the answered method to benchmark and find the fastest one.

sample size: 999999 (1 <= x <= 258962)

Python: Python 2.7.5 |Anaconda 1.8.0 (32-bit) (IPython)

OS: Windows 7 32-bit @ Core i5 / 4GB RAM

Sample-generation code

import random as rd
lst = [rd.randrange(1,258963) for n in range(999999)]

Solution from @Scorpion_God:

>>> %timeit x = [lst[i:i+3] for i in range(0, len(lst), 3)]
10 loops, best of 3: 114 ms per loop

Solution from @mskimm:

>>>  %timeit array = np.array(lst)
10 loops, best of 3: 127 ms per loop
>>> %timeit array.reshape(-1,3)
1000000 loops, best of 3: 679 ns per loop

Solution from @jonrsharpe / @Carsten:

>>> %timeit out = [list(t) for t in grouper(lst, 3)]
10 loops, best of 3: 158 ms per loop

So, it seems like, on IPython(Anaconda), list-comprehension is about 30% faster than itertools/izip_longest/grouper method

P.S. I think, this results are going to differ on CPython runtime, and i wish to add that also.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.