1

I have a dataframe with large numbers of columns. What I really want is to create/split the dataframe. For example:

generating TOY data:

df = pd.DataFrame(np.arange(10),columns = ['x'])
df['y'] = np.arange(30,40,1)
df['1'] = np.random.rand(10)
df['2'] = np.random.rand(10)
df['3'] = np.random.rand(10)
df['4'] = np.random.rand(10)
df['5'] = np.random.rand(10)

df =

    x   y   1              2           3          4            5
0   0   30  0.047787    0.435396    0.926836    0.314469    0.477411
1   1   31  0.083536    0.258120    0.682284    0.025050    0.713777
2   2   32  0.201041    0.872864    0.050977    0.580314    0.185589
3   3   33  0.105833    0.974538    0.559265    0.128242    0.217965
4   4   34  0.146551    0.662001    0.936995    0.050702    0.249724
5   5   35  0.098615    0.854952    0.649501    0.509777    0.726458
6   6   36  0.387889    0.040331    0.902277    0.051822    0.354042
7   7   37  0.321591    0.823724    0.052266    0.081491    0.187576
8   8   38  0.983665    0.152271    0.530755    0.384810    0.844386
9   9   39  0.649185    0.776682    0.239589    0.654547    0.581337

What I really want is to split dataframe in such a way like as shown below:

df1 =

    x   y   1
0   0   30  0.047787
1   1   31  0.083536
2   2   32  0.201041
3   3   33  0.105833
4   4   34  0.146551
5   5   35  0.098615
6   6   36  0.387889
7   7   37  0.321591
8   8   38  0.983665
9   9   39  0.649185

df2 =

    x    y    2
0   0   30  0.435396
1   1   31  0.25812
2   2   32  0.872864
3   3   33  0.974538
4   4   34  0.662001
5   5   35  0.854952
6   6   36  0.040331
7   7   37  0.823724
8   8   38  0.152271
9   9   39  0.776682

And so on. Since I have large number of columns, so it is very difficult to do it one by one. Is there any simpler way to do that?

Thank you in advance.

3 Answers 3

3

You could set x and y cols which would remain static throughout as the index axis and then perform a groupby across columns.

By utilizing a dictionary-comprehension, loop through every such groups. Additionally, a reset_index at the end would ensure that a flattened DF gets produced.

df.set_index(['x','y'], inplace=True)
dfs = {i:grp.reset_index() for i, grp in df.groupby(np.arange(len(df.columns)), axis=1)}

The keys of the resulting dictionary produced would constitute the column names which could be queried like:

dfs[0]

enter image description here

dfs[1]

enter image description here

and so on.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much Nickil Maveli. This is what I was looking for :)
3

You can use a list comprehension to automatically generate dataframes:

df_cuts = [df[['x', 'y', col]] for col in df.columns if col not in ['x', 'y']]

I verified the output in the command line:

for i in range(len(df_cuts)):
    print 'df %r:' % i
    print df_cuts[i]
    print '\n'

The result is such:

df 0:
   x   y         1
0  0  30  0.695465
1  1  31  0.425572
2  2  32  0.018986
3  3  33  0.165947
4  4  34  0.103120
5  5  35  0.069060
6  6  36  0.676640
7  7  37  0.492231
8  8  38  0.950436
9  9  39  0.156195


df 1:
   x   y         2
0  0  30  0.928538
1  1  31  0.019624
2  2  32  0.862811
3  3  33  0.289581
4  4  34  0.150975
5  5  35  0.835313
6  6  36  0.768760
7  7  37  0.396042
8  8  38  0.423745
9  9  39  0.268596


df 2:
   x   y         3
0  0  30  0.999175
1  1  31  0.004125
2  2  32  0.137457
3  3  33  0.042903
4  4  34  0.580698
5  5  35  0.663723
6  6  36  0.996608
7  7  37  0.960361
8  8  38  0.932486
9  9  39  0.758873


df 3:
   x   y         4
0  0  30  0.708976
1  1  31  0.547635
2  2  32  0.722322
3  3  33  0.912707
4  4  34  0.380471
5  5  35  0.607742
6  6  36  0.803980
7  7  37  0.569364
8  8  38  0.882297
9  9  39  0.954743


df 4:
   x   y         5
0  0  30  0.900532
1  1  31  0.247818
2  2  32  0.629371
3  3  33  0.502218
4  4  34  0.183292
5  5  35  0.875611
6  6  36  0.940828
7  7  37  0.200641
8  8  38  0.874052
9  9  39  0.525997

Comments

1

For me it looks like you could set index to ['x','y'] and then just get your columns by column names:

>>> df2 = df.set_index(['x','y'])
>>> df2
             1         2         3         4
x y                                         
0 30  0.161017  0.280965  0.058429  0.750003
1 31  0.643460  0.258441  0.951750  0.774355
2 32  0.948439  0.573363  0.126369  0.577629
3 33  0.896542  0.722825  0.927644  0.470369
4 34  0.298559  0.009676  0.841103  0.899220
5 35  0.855292  0.849880  0.529132  0.929805
6 36  0.428680  0.486381  0.271048  0.219826
7 37  0.752370  0.698653  0.980554  0.894405
8 38  0.027857  0.085865  0.086936  0.403528
9 39  0.522483  0.646266  0.825819  0.574025

>>> df2['1']
x  y 
0  30    0.161017
1  31    0.643460
2  32    0.948439
3  33    0.896542
4  34    0.298559
5  35    0.855292
6  36    0.428680
7  37    0.752370
8  38    0.027857
9  39    0.522483

if you just need to loop through the columns, you can do this:

>>> for i in range(1,5):
...     print df[['x','y',str(i)]]
... 
   x   y         1
0  0  30  0.161017
1  1  31  0.643460
2  2  32  0.948439
3  3  33  0.896542
4  4  34  0.298559
5  5  35  0.855292
6  6  36  0.428680
7  7  37  0.752370
8  8  38  0.027857
9  9  39  0.522483
   x   y         2
0  0  30  0.280965
1  1  31  0.258441
2  2  32  0.573363
3  3  33  0.722825
4  4  34  0.009676
5  5  35  0.849880
6  6  36  0.486381
7  7  37  0.698653
8  8  38  0.085865
9  9  39  0.646266
   x   y         3
0  0  30  0.058429
1  1  31  0.951750
2  2  32  0.126369
3  3  33  0.927644
4  4  34  0.841103
5  5  35  0.529132
6  6  36  0.271048
7  7  37  0.980554
8  8  38  0.086936
9  9  39  0.825819
   x   y         4
0  0  30  0.750003
1  1  31  0.774355
2  2  32  0.577629
3  3  33  0.470369
4  4  34  0.899220
5  5  35  0.929805
6  6  36  0.219826
7  7  37  0.894405
8  8  38  0.403528
9  9  39  0.574025

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.