Concatenate dataframes with multi-index in pandas dataframe

Question

I have two dataframes df1 and df2:

In [56]: df1.head()
Out[56]: 
                     col7                col8                col9          
                   alpha0        D0    alpha0        D0    alpha0        D0
F35_HC_531d.dat  1.103999  1.103999  1.364399  1.358938  3.171808  1.946894
F35_HC_532d.dat  0.000000  0.000000  1.636934  1.635594  4.359431  2.362530
F35_HC_533d.dat  0.826599  0.826599  1.463956  1.390134  3.860629  2.199387
F35_HC_534d.dat  1.055350  1.020555  3.112200  2.498257  3.394307  2.090668
F52_HC_472d.dat  3.808008  2.912733  3.594062  2.336720  3.027449  2.216112

In [62]: df2.head()
Out[62]: 
                   col7           col8              col9       
                 alpha1 alpha2  alpha1    alpha2  alpha1 alpha2
filename                                                       
F35_HC_532d.dat  1.0850  2.413  0.7914  6.072000  0.8418  5.328
M48_HC_551d.dat  0.7029  4.713  0.7309  2.922000  0.7823  3.546
M24_HC_458d.dat  0.7207  5.850  0.6772  5.699000  0.7135  5.620
M48_HC_552d.dat  0.7179  4.783  0.6481  4.131999  0.7010  3.408
M40_HC_506d.dat  0.7602  2.912  0.8420  5.690000  0.8354  1.910

I want to concat these two dataframes. Notice that the outer column names are same for both so I only want to see 4 sub-columns in a new dataframe. I tried using concat as:

df = pd.concat([df1, df2], axis = 1, levels = 0)

But this produces a dataframe with columns named from col7 to col9 twice (so the dataframe has 6 outer columns). How can I put all the columns in level 1 under same outer column names?

jezrael · Accepted Answer · 2017-04-15 04:48:38Z

You can add sort_index for sorting columns:

df = pd.concat([df1, df2], axis = 1, levels=0).sort_index(axis=1)
print (df)
                     col7                               col8            \
                       D0    alpha0  alpha1 alpha2        D0    alpha0   
F35_HC_531d.dat  1.103999  1.103999     NaN    NaN  1.358938  1.364399   
F35_HC_532d.dat  0.000000  0.000000  1.0850  2.413  1.635594  1.636934   
F35_HC_533d.dat  0.826599  0.826599     NaN    NaN  1.390134  1.463956   
F35_HC_534d.dat  1.020555  1.055350     NaN    NaN  2.498257  3.112200   
F52_HC_472d.dat  2.912733  3.808008     NaN    NaN  2.336720  3.594062   
M24_HC_458d.dat       NaN       NaN  0.7207  5.850       NaN       NaN   
M40_HC_506d.dat       NaN       NaN  0.7602  2.912       NaN       NaN   
M48_HC_551d.dat       NaN       NaN  0.7029  4.713       NaN       NaN   
M48_HC_552d.dat       NaN       NaN  0.7179  4.783       NaN       NaN   

                                       col9                           
                 alpha1    alpha2        D0    alpha0  alpha1 alpha2  
F35_HC_531d.dat     NaN       NaN  1.946894  3.171808     NaN    NaN  
F35_HC_532d.dat  0.7914  6.072000  2.362530  4.359431  0.8418  5.328  
F35_HC_533d.dat     NaN       NaN  2.199387  3.860629     NaN    NaN  
F35_HC_534d.dat     NaN       NaN  2.090668  3.394307     NaN    NaN  
F52_HC_472d.dat     NaN       NaN  2.216112  3.027449     NaN    NaN  
M24_HC_458d.dat  0.6772  5.699000       NaN       NaN  0.7135  5.620  
M40_HC_506d.dat  0.8420  5.690000       NaN       NaN  0.8354  1.910  
M48_HC_551d.dat  0.7309  2.922000       NaN       NaN  0.7823  3.546  
M48_HC_552d.dat  0.6481  4.131999       NaN       NaN  0.7010  3.408

piRSquared · Accepted Answer · 2017-04-15 05:52:44Z

2

You can use join with parameter how='outer'

df1.join(df2, how='outer').sort_index(1)

answered Apr 15, 2017 at 5:52

piRSquared

296k68 gold badges509 silver badges654 bronze badges

4 Comments

Peaceful Over a year ago

Nice! What I don't understand is why sort_index gets rid of the repeated column names. Any comment?

piRSquared Over a year ago

@Peaceful they are still there. When you have consecutive values in the earlier levels of the index, you're pandas chooses to combine the table columns for aesthetic reasons

Peaceful Over a year ago

But why does sort_index does that? Or is that even generically true? For example, if there were a function merge_repeated_columns, that would have been understandable. Am I missing something obvious?

piRSquared Over a year ago

@Peaceful if it wasn't sorted then you would not have all the columns of the same first level together. Then you'd have to show every column header because it would not make intuitive sense. Only after you sort can you make it look pretty

Nicoleue · Accepted Answer · 2023-09-04 12:17:17Z

0

Like @jezreal said. However pd.concat() has been updated and now you need to leave out the levels=0 keyword:

df = pd.concat([df1, df2], axis = 1).sort_index(axis=1)

answered Sep 4, 2023 at 12:17

Nicoleue

751 silver badge6 bronze badges

Collectives™ on Stack Overflow

Concatenate dataframes with multi-index in pandas dataframe

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related