Split rows into multiple rows with pandas

Question

I have a dataset in the following format. It got 48 columns and about 200000 rows.

slot1,slot2,slot3,slot4,slot5,slot6...,slot45,slot46,slot47,slot48
1,2,3,4,5,6,7,......,45,46,47,48
3.5,5.2,2,5.6,...............

I want to reshape this dataset to something as below, where N is less than 48 (maybe 24 or 12 etc..) column headers doesn't matter. when N = 4

slotNew1,slotNew2,slotNew3,slotNew4
1,2,3,4
5,6,7,8
......
45,46,47,48
3.5,5.2,2,5.6
............

I can read row by row and then split each row and append to a new dataframe. But that is very inefficient. Is there any efficient and faster way to do that?

hmm, It is not a must. But I can assume iN is a factor of 48 — Thusitha Thilina Dayaratne
– Thusitha Thilina Dayaratne, Commented Aug 12, 2019 at 2:07

pe-perry · Accepted Answer · 2019-08-12 06:49:22Z

1

You may try this

N = 4
df_new = pd.DataFrame(df_original.values.reshape(-1, N))
df_new.columns = ['slotNew{:}'.format(i + 1) for i in range(N)]

The code extracts the data into numpy.ndarray, reshape it, and create a new dataset of desired dimension.

Example:

import numpy as np
import pandas as pd

df0 = pd.DataFrame(np.arange(48 * 3).reshape(-1, 48))
df0.columns = ['slot{:}'.format(i + 1) for i in range(48)]
print(df0)
#    slot1  slot2  slot3  slot4   ...    slot45  slot46  slot47  slot48
# 0      0      1      2      3   ...        44      45      46      47
# 1     48     49     50     51   ...        92      93      94      95
# 2     96     97     98     99   ...       140     141     142     143
# 
# [3 rows x 48 columns]

N = 4
df = pd.DataFrame(df0.values.reshape(-1, N))
df.columns = ['slotNew{:}'.format(i + 1) for i in range(N)]
print(df.head())
#    slotNew1  slotNew2  slotNew3  slotNew4
# 0         0         1         2         3
# 1         4         5         6         7
# 2         8         9        10        11
# 3        12        13        14        15
# 4        16        17        18        19

Another approach

N = 4
df1 = df0.stack().reset_index()
df1['i'] = df1['level_1'].str.replace('slot', '').astype(int) // N
df1['j'] = df1['level_1'].str.replace('slot', '').astype(int) % N
df1['i'] -= (df1['j'] == 0) - df1['level_0'] * 48 / N
df1['j'] += (df1['j'] == 0) * N
df1['j'] = 'slotNew' + df1['j'].astype(str)
df1 = df1[['i', 'j', 0]]
df = df1.pivot(index='i', columns='j', values=0)

edited Aug 12, 2019 at 6:49

answered Aug 12, 2019 at 2:06

pe-perry

2,6312 gold badges25 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Thusitha Thilina Dayaratne Over a year ago

It was my mistake. I didn't remove unwanted columns before reshaping. When I remove the unwanted columns your solution works. Thanks (y)

Chris · Accepted Answer · 2019-08-12 02:14:50Z

Use pandas.explode after making chunks. Given df:

import pandas as pd

df = pd.DataFrame([np.arange(1, 49)], columns=['slot%s' % i for i in range(1, 49)])
print(df)

   slot1  slot2  slot3  slot4  slot5  slot6  slot7  slot8  slot9  slot10  ...  \
0      1      2      3      4      5      6      7      8      9      10  ...   

   slot39  slot40  slot41  slot42  slot43  slot44  slot45  slot46  slot47  \
0      39      40      41      42      43      44      45      46      47   

   slot48  
0      48

Using chunks to divide:

def chunks(l, n):
    """Yield successive n-sized chunks from l.
    Source: https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
    """
    n_items = len(l)
    if n_items % n:
        n_pads = n - n_items % n
    else:
        n_pads = 0
    l = l + [np.nan for _ in range(n_pads)] 
    for i in range(0, len(l), n):
        yield l[i:i + n]

N = 4
new_df = pd.DataFrame(list(df.apply(lambda x: list(chunks(list(x), N)), 1).explode()))
print(new_df)

Output:

     0   1   2   3
0    1   2   3   4
1    5   6   7   8
2    9  10  11  12
3   13  14  15  16
4   17  18  19  20
...

Advantage of this approach over numpy.reshape is that it can handle when N is not a factor:

N = 7
new_df = pd.DataFrame(list(df.apply(lambda x: list(chunks(list(x), N)), 1).explode()))
print(new_df)

Output:

    0   1   2   3   4   5     6
0   1   2   3   4   5   6   7.0
1   8   9  10  11  12  13  14.0
2  15  16  17  18  19  20  21.0
3  22  23  24  25  26  27  28.0
4  29  30  31  32  33  34  35.0
5  36  37  38  39  40  41  42.0
6  43  44  45  46  47  48   NaN

I marked kitman's answer since it id direct when the N is a factor of 48. But your answer is valid for even when the N is not a factor. Thanks :)

Collectives™ on Stack Overflow

Split rows into multiple rows with pandas

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related