0

I have the following data frames:

import pandas as pd
df = pd.DataFrame( { 
    "Name" : ["gene1","gene2","gene3","gene4"] , 
    "T1" : [0.33,1,3,4],
    "T2" : [1.23,2.1,3.5,5.0], } )

Which looks like this:

In [30]: df
Out[30]:
    Name    T1    T2
0  gene1  0.33  1.23
1  gene2  1.00  2.10
2  gene3  3.00  3.50
3  gene4  4.00  5.00

What I want to do is group based on T1 and T2 (basically all columns from 2nd columns onward). Note that the column names follow no pattern and can be more than two.

The end result I hope to get is this:

T1
    Name    T1   
0  gene1  0.33  
1  gene2  1.00  
2  gene3  3.00  
3  gene4  4.00 

T2
    Name    T2
0  gene1    1.23
1  gene2    2.10
2  gene3    3.50
3  gene4    5.00

How can I achieve that?

I tried this but doesn't give what I want:

tlist = list(df.columns[1:])
for dft in df.groupby(tlist,axis=1):
    print df
1
  • In your hoped result you are not grouping, you are just listing the columns. If that's really what you want, you just have to set "Name" as an index. df = df.set_index("Name") and then print the results for c in df.columns: print(df[c]) Commented Mar 30, 2015 at 3:35

1 Answer 1

2

You can get there using pd.melt():

melted = pd.melt(df, id_vars='Name', var_name='t_col')

for t_col, sub_df in melted.groupby('t_col'):
    print(sub_df)

    Name t_col  value
0  gene1    T1   0.33
1  gene2    T1   1.00
2  gene3    T1   3.00
3  gene4    T1   4.00
    Name t_col  value
4  gene1    T2   1.23
5  gene2    T2   2.10
6  gene3    T2   3.50
7  gene4    T2   5.00
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.