Split list in row into multiple chunked rows

Question

I have a pandas data frame where some rows contain a list of results that come back from a system. I'm trying to split those lists into smaller chunks (in the reproducible example below, chunks of 2) with each chunk as a new row. I did work out that I could use numpy's repeat function to duplicate the rows to have a row for each chunk I need, but then I'm not sure how to only include a chunk of the list in Result in its place. (i.e. a row should be ['SUCCESS', 'Misc] and the next ['Doom'] vs a row of [['SUCCESS', 'Misc'],['Doom']])

I know that the best solution would be to just make each item in the list a new row using explode but because of client requirements, this is not an option.

Code

import pandas as pd
import numpy as np

data = {'Result': [['SUCCESS'], ['SUCCESS'], ['FAILURE'], ['Pending', 'Pending', 'SUCCESS', 'Misc', 'Doom'], ['FAILURE'], ['Pending', 'SUCCESS']], 'Date': ['10/10/2019', '10/09/2019', '10/08/2019', '10/07/2019', '10/06/2019', '10/05/2019']}
goal = {'Result': [['SUCCESS'], ['SUCCESS'], ['FAILURE'], ['Pending', 'Pending'], ['SUCCESS'], ['FAILURE'], ['Pending', 'SUCCESS']], 'Date': ['10/10/2019', '10/09/2019', '10/08/2019', '10/07/2019', '10/06/2019', '10/05/2019', '10/04/2019']}

df = pd.DataFrame(data)

df['len_res'] = df['Result'].str.len()

def chunking(l, n):
    for i in range(0, len(l), n):
        yield l[i:i + n]


df['chunks'] = 1
for i in range(len(df)):
    if df['len_res'][i] > 2:
        df['Result'][i] = list(chunking(df['Result'][i], 2))
        df['chunks'][i] = len(df['Result'][i])
    else:
        pass

Actual Output

                                          Result        Date  len_res  chunks
0                                      [SUCCESS]  10/10/2019        1       1
1                                      [SUCCESS]  10/09/2019        1       1
2                                      [FAILURE]  10/08/2019        1       1
3  [[Pending, Pending], [SUCCESS, Misc], [Doom]]  10/07/2019        5       3
4                                      [FAILURE]  10/06/2019        1       1
5                             [Pending, SUCCESS]  10/05/2019        2       1

Desired Output

                                          Result        Date  len_res  chunks
0                                      [SUCCESS]  10/10/2019        1       1
1                                      [SUCCESS]  10/09/2019        1       1
2                                      [FAILURE]  10/08/2019        1       1
3                             [Pending, Pending]  10/07/2019        5       3
4                                [SUCCESS, Misc]  10/07/2019        5       3
5                                         [Doom]  10/07/2019        5       3
6                                      [FAILURE]  10/06/2019        1       1
7                             [Pending, SUCCESS]  10/05/2019        2       1

With np.repeat

df = df.loc[np.repeat(df.index.values, df.chunks)]
df = df.reset_index(drop=True)

                                          Result        Date  len_res  chunks
0                                      [SUCCESS]  10/10/2019        1       1
1                                      [SUCCESS]  10/09/2019        1       1
2                                      [FAILURE]  10/08/2019        1       1
3  [[Pending, Pending], [SUCCESS, Misc], [Doom]]  10/07/2019        5       3
4  [[Pending, Pending], [SUCCESS, Misc], [Doom]]  10/07/2019        5       3
5  [[Pending, Pending], [SUCCESS, Misc], [Doom]]  10/07/2019        5       3
6                                      [FAILURE]  10/06/2019        1       1
7                             [Pending, SUCCESS]  10/05/2019        2       1

Code Different · Accepted Answer · 2019-10-16 00:39:25Z

2

If you are on pandas v0.25 or later, use explode:

size = 2
df['Result'] = df['Result'].apply(lambda r: np.array_split(r, np.ceil(len(r) / size)))
df['chunks'] = df['Result'].str.len()

df = df.explode('Result')

np.array_split splits an array into n = ceil(len(r) / size) parts:

[1]     --> [[1]]
[1,2]   --> [[1,2]]
[1,2,3] --> [[1,2], [3]]

explode repeats each row for each element in the outer-most level of the array in Result.

answered Oct 16, 2019 at 0:39

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

carousallie Over a year ago

This is so elegant and efficient, and does exactly what I need. Thank you so much!

Collectives™ on Stack Overflow

Split list in row into multiple chunked rows

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related