splitting pandas dataframe with columns

Question

I have a dataframe with large numbers of columns. What I really want is to create/split the dataframe. For example:

generating TOY data:

df = pd.DataFrame(np.arange(10),columns = ['x'])
df['y'] = np.arange(30,40,1)
df['1'] = np.random.rand(10)
df['2'] = np.random.rand(10)
df['3'] = np.random.rand(10)
df['4'] = np.random.rand(10)
df['5'] = np.random.rand(10)

df =

    x   y   1              2           3          4            5
0   0   30  0.047787    0.435396    0.926836    0.314469    0.477411
1   1   31  0.083536    0.258120    0.682284    0.025050    0.713777
2   2   32  0.201041    0.872864    0.050977    0.580314    0.185589
3   3   33  0.105833    0.974538    0.559265    0.128242    0.217965
4   4   34  0.146551    0.662001    0.936995    0.050702    0.249724
5   5   35  0.098615    0.854952    0.649501    0.509777    0.726458
6   6   36  0.387889    0.040331    0.902277    0.051822    0.354042
7   7   37  0.321591    0.823724    0.052266    0.081491    0.187576
8   8   38  0.983665    0.152271    0.530755    0.384810    0.844386
9   9   39  0.649185    0.776682    0.239589    0.654547    0.581337

What I really want is to split dataframe in such a way like as shown below:

df1 =

    x   y   1
0   0   30  0.047787
1   1   31  0.083536
2   2   32  0.201041
3   3   33  0.105833
4   4   34  0.146551
5   5   35  0.098615
6   6   36  0.387889
7   7   37  0.321591
8   8   38  0.983665
9   9   39  0.649185

df2 =

    x    y    2
0   0   30  0.435396
1   1   31  0.25812
2   2   32  0.872864
3   3   33  0.974538
4   4   34  0.662001
5   5   35  0.854952
6   6   36  0.040331
7   7   37  0.823724
8   8   38  0.152271
9   9   39  0.776682

And so on. Since I have large number of columns, so it is very difficult to do it one by one. Is there any simpler way to do that?

Thank you in advance.

Nickil Maveli · Accepted Answer · 2017-01-25 10:50:04Z

3

You could set x and y cols which would remain static throughout as the index axis and then perform a groupby across columns.

By utilizing a dictionary-comprehension, loop through every such groups. Additionally, a reset_index at the end would ensure that a flattened DF gets produced.

df.set_index(['x','y'], inplace=True)
dfs = {i:grp.reset_index() for i, grp in df.groupby(np.arange(len(df.columns)), axis=1)}

The keys of the resulting dictionary produced would constitute the column names which could be queried like:

dfs[0]

dfs[1]

and so on.

edited Jan 25, 2017 at 10:50

answered Jan 25, 2017 at 10:44

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

bikuser Over a year ago

Thank you very much Nickil Maveli. This is what I was looking for :)

keepitwiel · Accepted Answer · 2017-01-25 11:01:02Z

You can use a list comprehension to automatically generate dataframes:

df_cuts = [df[['x', 'y', col]] for col in df.columns if col not in ['x', 'y']]

I verified the output in the command line:

for i in range(len(df_cuts)):
    print 'df %r:' % i
    print df_cuts[i]
    print '\n'

The result is such:

df 0:
   x   y         1
0  0  30  0.695465
1  1  31  0.425572
2  2  32  0.018986
3  3  33  0.165947
4  4  34  0.103120
5  5  35  0.069060
6  6  36  0.676640
7  7  37  0.492231
8  8  38  0.950436
9  9  39  0.156195


df 1:
   x   y         2
0  0  30  0.928538
1  1  31  0.019624
2  2  32  0.862811
3  3  33  0.289581
4  4  34  0.150975
5  5  35  0.835313
6  6  36  0.768760
7  7  37  0.396042
8  8  38  0.423745
9  9  39  0.268596


df 2:
   x   y         3
0  0  30  0.999175
1  1  31  0.004125
2  2  32  0.137457
3  3  33  0.042903
4  4  34  0.580698
5  5  35  0.663723
6  6  36  0.996608
7  7  37  0.960361
8  8  38  0.932486
9  9  39  0.758873


df 3:
   x   y         4
0  0  30  0.708976
1  1  31  0.547635
2  2  32  0.722322
3  3  33  0.912707
4  4  34  0.380471
5  5  35  0.607742
6  6  36  0.803980
7  7  37  0.569364
8  8  38  0.882297
9  9  39  0.954743


df 4:
   x   y         5
0  0  30  0.900532
1  1  31  0.247818
2  2  32  0.629371
3  3  33  0.502218
4  4  34  0.183292
5  5  35  0.875611
6  6  36  0.940828
7  7  37  0.200641
8  8  38  0.874052
9  9  39  0.525997

roman · Accepted Answer · 2017-01-25 10:44:56Z

For me it looks like you could set index to ['x','y'] and then just get your columns by column names:

>>> df2 = df.set_index(['x','y'])
>>> df2
             1         2         3         4
x y                                         
0 30  0.161017  0.280965  0.058429  0.750003
1 31  0.643460  0.258441  0.951750  0.774355
2 32  0.948439  0.573363  0.126369  0.577629
3 33  0.896542  0.722825  0.927644  0.470369
4 34  0.298559  0.009676  0.841103  0.899220
5 35  0.855292  0.849880  0.529132  0.929805
6 36  0.428680  0.486381  0.271048  0.219826
7 37  0.752370  0.698653  0.980554  0.894405
8 38  0.027857  0.085865  0.086936  0.403528
9 39  0.522483  0.646266  0.825819  0.574025

>>> df2['1']
x  y 
0  30    0.161017
1  31    0.643460
2  32    0.948439
3  33    0.896542
4  34    0.298559
5  35    0.855292
6  36    0.428680
7  37    0.752370
8  38    0.027857
9  39    0.522483

if you just need to loop through the columns, you can do this:

>>> for i in range(1,5):
...     print df[['x','y',str(i)]]
... 
   x   y         1
0  0  30  0.161017
1  1  31  0.643460
2  2  32  0.948439
3  3  33  0.896542
4  4  34  0.298559
5  5  35  0.855292
6  6  36  0.428680
7  7  37  0.752370
8  8  38  0.027857
9  9  39  0.522483
   x   y         2
0  0  30  0.280965
1  1  31  0.258441
2  2  32  0.573363
3  3  33  0.722825
4  4  34  0.009676
5  5  35  0.849880
6  6  36  0.486381
7  7  37  0.698653
8  8  38  0.085865
9  9  39  0.646266
   x   y         3
0  0  30  0.058429
1  1  31  0.951750
2  2  32  0.126369
3  3  33  0.927644
4  4  34  0.841103
5  5  35  0.529132
6  6  36  0.271048
7  7  37  0.980554
8  8  38  0.086936
9  9  39  0.825819
   x   y         4
0  0  30  0.750003
1  1  31  0.774355
2  2  32  0.577629
3  3  33  0.470369
4  4  34  0.899220
5  5  35  0.929805
6  6  36  0.219826
7  7  37  0.894405
8  8  38  0.403528
9  9  39  0.574025

Collectives™ on Stack Overflow

splitting pandas dataframe with columns

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related