0

I have two data frames df1 and df2. They are created with the following codes:

import pandas as pd
df1 = pd.DataFrame([["Probe1", "Gene1", 3,11], 
                    ["Probe1", "Gene2", 6,10],
                    ["Probe2","Gene2", 13,18]], 
        columns=['probe', 'gene', 'Sample1', "Sample2"]).set_index(['probe', 'gene'])
df1.columns.names = ['Sample']
# Note that number of samples can be more than two


df2 = df1.copy()
df2[df2>0] = 1.00

So it looks like this:

In [74]: df1
Out[74]:
Sample        Sample1  Sample2
probe  gene
Probe1 Gene1        3       11
       Gene2        6       10
Probe2 Gene2       13       18

In [75]: df2
Out[75]:
Sample        Sample1  Sample2
probe  gene
Probe1 Gene1        1        1
       Gene2        1        1
Probe2 Gene2        1        1

What I want to do is to concatenate these two data frame so that in the end it will write into CSV file that looks like this:

PROBE  GENE      SMPL1    SMPL2 PROBE  GENE      SMPL1    SMPL2
Probe1 Gene1        3       11  Probe1 Gene1      1        1
Probe1 Gene2        6       10  Probe1 Gene2      1        1
Probe2 Gene2       13       18  Probe2 Gene2      1        1

I'm stuck with this:pd.concat(ndf,axis=1)

What's the right way to do it?

3 Answers 3

3

Reseting the index should give you what you want.

pd.concat([df1.reset_index(),df2.reset_index()],axis=1)

Output:

Sample   probe   gene  Sample1  Sample2   probe   gene  Sample1  Sample2

0       Probe1  Gene1        3       11  Probe1  Gene1        1        1
1       Probe1  Gene2        6       10  Probe1  Gene2        1        1
2       Probe2  Gene2       13       18  Probe2  Gene2        1        1
Sign up to request clarification or add additional context in comments.

2 Comments

No it doesn't! I cannot reproduce your result.
@pdubois Whoops! Edited.
2

User join and then reset_index:

In [1422]: df1
Out[1422]: 
Sample        Sample1  Sample2
probe  gene                   
Probe1 Gene1        3       11
       Gene2        6       10
Probe2 Gene2       13       18

In [1423]: df2
Out[1423]: 
Sample        Sample1  Sample2
probe  gene                   
Probe1 Gene1        1        1
       Gene2        1        1
Probe2 Gene2        1        1

Output:

In [1424]: df1.join(df2, rsuffix='df2').reset_index()
Out[1424]: 
Sample   probe   gene  Sample1  Sample2  Sample1df2  Sample2df2
0       Probe1  Gene1        3       11           1           1
1       Probe1  Gene2        6       10           1           1
2       Probe2  Gene2       13       18           1           1

Comments

1

Try this, I generalize to 4 samples:

import pandas as pd
df1 = pd.DataFrame([["Probe1", "Gene1", 3,11,30,100], 
                   ["Probe1", "Gene2", 6,10,100,23],
                   ["Probe2","Gene2", 13,18,20,77]], 
        columns=['probe', 'gene', 'Sample1', "Sample2","Sample3","Sample4"]).set_index(['probe', 'gene'])
df1.columns.names = ['Sample']


df2 = df1.copy()
df2[df2>0] = 1.00
ndf = [df1,df2]
fdf = pd.concat(ndf,axis=1)
fdf.reset_index(inplace=True)

ins1 = df1.shape[1]+2
ins2 = ins1 + 1
print ins1,ins2
fdf.insert(ins1,'probe2',fdf['probe'])
fdf.insert(ins2,'gene2',fdf['gene'])
fdf

gives

In [149]: fdf
Out[149]:
Sample   probe   gene  Sample1  Sample2  Sample3  Sample4  probe2  gene2  \
0       Probe1  Gene1        3       11       30      100  Probe1  Gene1
1       Probe1  Gene2        6       10      100       23  Probe1  Gene2
2       Probe2  Gene2       13       18       20       77  Probe2  Gene2

Sample  Sample1  Sample2  Sample3  Sample4
0             1        1        1        1
1             1        1        1        1
2             1        1        1        1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.