1

I have a pandas data frame where some rows contain a list of results that come back from a system. I'm trying to split those lists into smaller chunks (in the reproducible example below, chunks of 2) with each chunk as a new row. I did work out that I could use numpy's repeat function to duplicate the rows to have a row for each chunk I need, but then I'm not sure how to only include a chunk of the list in Result in its place. (i.e. a row should be ['SUCCESS', 'Misc] and the next ['Doom'] vs a row of [['SUCCESS', 'Misc'],['Doom']])

I know that the best solution would be to just make each item in the list a new row using explode but because of client requirements, this is not an option.

Code

import pandas as pd
import numpy as np

data = {'Result': [['SUCCESS'], ['SUCCESS'], ['FAILURE'], ['Pending', 'Pending', 'SUCCESS', 'Misc', 'Doom'], ['FAILURE'], ['Pending', 'SUCCESS']], 'Date': ['10/10/2019', '10/09/2019', '10/08/2019', '10/07/2019', '10/06/2019', '10/05/2019']}
goal = {'Result': [['SUCCESS'], ['SUCCESS'], ['FAILURE'], ['Pending', 'Pending'], ['SUCCESS'], ['FAILURE'], ['Pending', 'SUCCESS']], 'Date': ['10/10/2019', '10/09/2019', '10/08/2019', '10/07/2019', '10/06/2019', '10/05/2019', '10/04/2019']}

df = pd.DataFrame(data)

df['len_res'] = df['Result'].str.len()

def chunking(l, n):
    for i in range(0, len(l), n):
        yield l[i:i + n]


df['chunks'] = 1
for i in range(len(df)):
    if df['len_res'][i] > 2:
        df['Result'][i] = list(chunking(df['Result'][i], 2))
        df['chunks'][i] = len(df['Result'][i])
    else:
        pass

Actual Output

                                          Result        Date  len_res  chunks
0                                      [SUCCESS]  10/10/2019        1       1
1                                      [SUCCESS]  10/09/2019        1       1
2                                      [FAILURE]  10/08/2019        1       1
3  [[Pending, Pending], [SUCCESS, Misc], [Doom]]  10/07/2019        5       3
4                                      [FAILURE]  10/06/2019        1       1
5                             [Pending, SUCCESS]  10/05/2019        2       1

Desired Output

                                          Result        Date  len_res  chunks
0                                      [SUCCESS]  10/10/2019        1       1
1                                      [SUCCESS]  10/09/2019        1       1
2                                      [FAILURE]  10/08/2019        1       1
3                             [Pending, Pending]  10/07/2019        5       3
4                                [SUCCESS, Misc]  10/07/2019        5       3
5                                         [Doom]  10/07/2019        5       3
6                                      [FAILURE]  10/06/2019        1       1
7                             [Pending, SUCCESS]  10/05/2019        2       1

With np.repeat

df = df.loc[np.repeat(df.index.values, df.chunks)]
df = df.reset_index(drop=True)

                                          Result        Date  len_res  chunks
0                                      [SUCCESS]  10/10/2019        1       1
1                                      [SUCCESS]  10/09/2019        1       1
2                                      [FAILURE]  10/08/2019        1       1
3  [[Pending, Pending], [SUCCESS, Misc], [Doom]]  10/07/2019        5       3
4  [[Pending, Pending], [SUCCESS, Misc], [Doom]]  10/07/2019        5       3
5  [[Pending, Pending], [SUCCESS, Misc], [Doom]]  10/07/2019        5       3
6                                      [FAILURE]  10/06/2019        1       1
7                             [Pending, SUCCESS]  10/05/2019        2       1

1 Answer 1

2

If you are on pandas v0.25 or later, use explode:

size = 2
df['Result'] = df['Result'].apply(lambda r: np.array_split(r, np.ceil(len(r) / size)))
df['chunks'] = df['Result'].str.len()

df = df.explode('Result')

np.array_split splits an array into n = ceil(len(r) / size) parts:

[1]     --> [[1]]
[1,2]   --> [[1,2]]
[1,2,3] --> [[1,2], [3]]

explode repeats each row for each element in the outer-most level of the array in Result.

Sign up to request clarification or add additional context in comments.

1 Comment

This is so elegant and efficient, and does exactly what I need. Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.