Grouping and merging two dataframes in Python

Question

I have been banging my head around this one for some time now and can't get my head around it...

I have two Pandas Dataframes df1 and df2 which contains information that I want to summarize neatly into one.

So far, I have used an aggregate function to summarize each of these as follows:

aggregation = {'A' : {'a' : 'mean'}, 'B' : {'b' : 'mean'}, 'C' : {'c' : 'sum'}}

>> df1.groupby(by=['LEVEL_1']).agg(aggregation)

            A      B      C
            a      b      c
LEVEL_1     
lvl_a       1.0    2.0    3.0
lvl_b       4.0    5.0    6.0
lvl_c       7.0    8.0    9.0

Same for my other DataFrame

>> df2.groupby(by=['LEVEL_1']).agg(aggregation)

            A      B      C
            a      b      c
LEVEL_1     
lvl_a       10.0   11.0   12.0
lvl_b       13.0   14.0   15.0
lvl_c       16.0   17.0   18.0

Now, I would like to combine these two into one, total, DataFrame where my columns are grouped into the two "information universes", with an additional row totals, which is the mean of all the rows, per column, like so:

            a             b            c
            df1    df2    df1   df2    df1   df2
LEVEL_1     
lvl_a       1.0    10.0   2.0   11.0   3.0    12.0
lvl_b       4.0    13.0   5.0   14.0   6.0    15.0
lvl_c       7.0    16.0   8.0   17.0   9.0    18.0
totals      4.0    13.0   5.0   14.0   6.0    15.0

There is, most likely, a supereasy way to do this, but I have not figured it out...

Thanks in advance guys.

BENY · Accepted Answer · 2018-03-07 17:13:05Z

1

I think you need concat+droplevel+swaplevel

s=pd.concat([df1,df2],axis=1,keys=['df1','df2'])
s.columns=s.columns.droplevel(1)

s=s.swaplevel(0,1,axis=1).sort_index(axis=1)
s
Out[473]: 
         a         b         c     
       df1  df2  df1  df2  df1  df2
lvl_a  1.0  1.0  2.0  2.0  3.0  3.0
lvl_b  4.0  4.0  5.0  5.0  6.0  6.0
lvl_c  7.0  7.0  8.0  8.0  9.0  9.0

Update

pd.concat([s,s.sum().to_frame('total').T])
Out[479]: 
          a           b           c      
        df1   df2   df1   df2   df1   df2
lvl_a   1.0   1.0   2.0   2.0   3.0   3.0
lvl_b   4.0   4.0   5.0   5.0   6.0   6.0
lvl_c   7.0   7.0   8.0   8.0   9.0   9.0
total  12.0  12.0  15.0  15.0  18.0  18.0

edited Mar 7, 2018 at 17:13

answered Mar 7, 2018 at 16:40

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

gussilago Over a year ago

Haha, that was the fastest answer so far. Thank you! Never seen the swaplevel-functionality before, but your answer works perfectly!

BENY Over a year ago

@gussilago yw~ :-) happy coding

gussilago Over a year ago

While I'm at it: Is it possible to also add a "totals"-row (mean of all rows, per column) to the final dataframe :S If you an answer, I can always add it to the original question...

gussilago Over a year ago

Much appreciated @Wen

BENY Over a year ago

@gussilago yw :-) happy coding

Collectives™ on Stack Overflow

Grouping and merging two dataframes in Python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related