python split string every 3rd value but into a nested format

Question

I have a list like so:

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

I want it to look like so

[['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']]

what's the most efficient way to do this?

edit: what about going the other way?

[['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']]

-->

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

I have a few data sets, some are small 5000 observations, some are 100k observations — jason
– jason, Commented Apr 9, 2014 at 1:38
i asked because i felt this as an interesting problem to look for efficiency; i am thinking to benchmark all the below solutions for the "fastest and the furious" one ... — kmonsoor
– kmonsoor, Commented Apr 9, 2014 at 11:43

Scorpion_God · Accepted Answer · 2014-04-08 15:44:27Z

12

You can do what you want with a simple list comprehension.

>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> [a[i:i+3] for i in range(0, len(a), 3)]
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

If you want the last sub-list to be padded you can do this before the list comprehension:

>>> padding = 0
>>> a += [padding]*(3-len(a)%3)

Combining these together into a single function:

def group(sequence, group_length, padding=None):
    if padding is not None:
        sequence += [padding]*(group_length-len(sequence)%group_length)
    return [sequence[i:i+group_length] for i in range(0, len(sequence), group_length)]

Going the other way:

def flatten(sequence):
    return [item for sublist in sequence for item in sublist]

>>> a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> flatten(a)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

edited Apr 8, 2014 at 15:44

answered Apr 8, 2014 at 9:19

Scorpion_God

1,50910 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

jason Over a year ago

wow. so simple. doesn't seem so but any drawbacks or warnings to this code?

jonrsharpe Over a year ago

@jason_cant_code it doesn't pad the last sub-list; whether that is a drawback or a feature depends on your perspective, and what you do with it next!

Scorpion_God Over a year ago

@jonrsharpe I added a solution for that problem.

jason Over a year ago

@Scorpion_God. What about going the other way? [['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']] --> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

Scorpion_God Over a year ago

@jason_cant_code try [item for sublist in mylist for item in sublist]

|

emesday · Accepted Answer · 2014-04-08 09:43:04Z

4

If you can use numpy, try x.reshape(-1, 3)

In [1]: import numpy as np
In [2]: x = ['a','b','c','d','e','f','g','h','i']
In [3]: x = np.array(x)
In [4]: x.reshape(-1, 3)
Out[4]: 
array([['a', 'b', 'c'],
       ['d', 'e', 'f'],
       ['g', 'h', 'i']], 
      dtype='|S1')

if data is big enough, this code is more efficient.

Update

appending cProfile results to explain more efficient

import cProfile
import numpy as np

a = range(10000000*3)

def impl_a():
    x = [a[i:i+3] for i in range(0, len(a), 3)]

def impl_b():
    x = np.array(a)
    x = x.reshape(-1, 3)

print("cProfile reuslt of impl_a()")
cProfile.run("impl_a()")
print("cProfile reuslt of impl_b()")
cProfile.run("impl_b()")

Output is

cProfile reuslt of impl_a()
      5 function calls in 15.614 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.499    0.499   15.614   15.614 <string>:1(<module>)
     1   14.968   14.968   15.114   15.114 impla.py:6(impl_a)
     1    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.146    0.146    0.146    0.146 {range}


cProfile reuslt of impl_b()
     5 function calls in 3.142 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    3.142    3.142 <string>:1(<module>)
     1    0.000    0.000    3.142    3.142 impla.py:9(impl_b)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1    3.142    3.142    3.142    3.142 {numpy.core.multiarray.array}

edited Apr 8, 2014 at 9:43

answered Apr 8, 2014 at 9:24

emesday

6,2063 gold badges31 silver badges46 bronze badges

4 Comments

jason Over a year ago

Interesting method. How big to be more efficient? and more efficient than which method?

jason Over a year ago

Thanks for elaborating. Very good to know. I will use it later on for bigger data sets.

dorvak Over a year ago

Is it more efficient in general, or just with int's? What about strings, objects or even mixed types..

kmonsoor Over a year ago

@mskimm in my test, scorpion_god's list comprehension seems like 10% more efficient than yours

jonrsharpe · Accepted Answer · 2014-04-08 09:20:30Z

3

You can use the grouper recipe from itertools with a list comprehension:

from itertools import izip_longest # or zip_longest for Python 3.x

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args) # see note above

in_ = [1, 2, 3, 4, 5, 6, 7, 8, 9]

out = [list(t) for t in grouper(in_, 3)]

answered Apr 8, 2014 at 9:20

jonrsharpe

123k31 gold badges277 silver badges488 bronze badges

2 Comments

jonrsharpe Over a year ago

@kmonsoor potentially, if literally all you want is to split a 9-item list into threes. However, itertools makes it efficient for much longer lists (you don't have to build the list, you can just iterate through the groups) and provides for easily changing the length or padding the last triple (with None by default). grouper is only two lines of code, it's hardly complex.

jason Over a year ago

these are all really good solutions presented. I guess it just depends on how I use them. I can see good use for all of these answers in different situations.

Stephan Kulla · Accepted Answer · 2014-04-08 09:22:10Z

2

My solution:

>>> list=[1,2,3,4,5,6,7,8,9,10]
>>> map(lambda i: list[i:i+3], range(0,len(list),3))
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

answered Apr 8, 2014 at 9:22

Stephan Kulla

5,1273 gold badges28 silver badges35 bronze badges

Comments

Carsten · Accepted Answer · 2014-04-08 09:21:01Z

1

Use itertools, more specifically, the function grouper mentioned unter Recipes:

from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print [list(x) for x in grouper(a, 3)]

This prints

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

answered Apr 8, 2014 at 9:21

Carsten

18.5k4 gold badges52 silver badges56 bronze badges

Comments

kmonsoor · Accepted Answer · 2014-04-09 13:58:37Z

I have ran all the answered method to benchmark and find the fastest one.

sample size: 999999 (1 <= x <= 258962)

Python: Python 2.7.5 |Anaconda 1.8.0 (32-bit) (IPython)

OS: Windows 7 32-bit @ Core i5 / 4GB RAM

Sample-generation code

import random as rd
lst = [rd.randrange(1,258963) for n in range(999999)]

Solution from @Scorpion_God:

>>> %timeit x = [lst[i:i+3] for i in range(0, len(lst), 3)]
10 loops, best of 3: 114 ms per loop

Solution from @mskimm:

>>>  %timeit array = np.array(lst)
10 loops, best of 3: 127 ms per loop
>>> %timeit array.reshape(-1,3)
1000000 loops, best of 3: 679 ns per loop

Solution from @jonrsharpe / @Carsten:

>>> %timeit out = [list(t) for t in grouper(lst, 3)]
10 loops, best of 3: 158 ms per loop

So, it seems like, on IPython(Anaconda), list-comprehension is about 30% faster than itertools/izip_longest/grouper method

P.S. I think, this results are going to differ on CPython runtime, and i wish to add that also.

Collectives™ on Stack Overflow

python split string every 3rd value but into a nested format

6 Answers 6

8 Comments

4 Comments

2 Comments

Comments

Comments

Sample-generation code

Solution from @Scorpion_God:

Solution from @mskimm:

Solution from @jonrsharpe / @Carsten:

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

8 Comments

4 Comments

2 Comments

Comments

Comments

Sample-generation code

Solution from @Scorpion_God:

Solution from @mskimm:

Solution from @jonrsharpe / @Carsten:

Comments

Your Answer

Sign up or log in

Post as a guest

Related