0

I have been banging my head around this one for some time now and can't get my head around it...

I have two Pandas Dataframes df1 and df2 which contains information that I want to summarize neatly into one.

So far, I have used an aggregate function to summarize each of these as follows:

aggregation = {'A' : {'a' : 'mean'}, 'B' : {'b' : 'mean'}, 'C' : {'c' : 'sum'}}

>> df1.groupby(by=['LEVEL_1']).agg(aggregation)

            A      B      C
            a      b      c
LEVEL_1     
lvl_a       1.0    2.0    3.0
lvl_b       4.0    5.0    6.0
lvl_c       7.0    8.0    9.0

Same for my other DataFrame

>> df2.groupby(by=['LEVEL_1']).agg(aggregation)

            A      B      C
            a      b      c
LEVEL_1     
lvl_a       10.0   11.0   12.0
lvl_b       13.0   14.0   15.0
lvl_c       16.0   17.0   18.0

Now, I would like to combine these two into one, total, DataFrame where my columns are grouped into the two "information universes", with an additional row totals, which is the mean of all the rows, per column, like so:

            a             b            c
            df1    df2    df1   df2    df1   df2
LEVEL_1     
lvl_a       1.0    10.0   2.0   11.0   3.0    12.0
lvl_b       4.0    13.0   5.0   14.0   6.0    15.0
lvl_c       7.0    16.0   8.0   17.0   9.0    18.0
totals      4.0    13.0   5.0   14.0   6.0    15.0

There is, most likely, a supereasy way to do this, but I have not figured it out...

Thanks in advance guys.

1 Answer 1

1

I think you need concat+droplevel+swaplevel

s=pd.concat([df1,df2],axis=1,keys=['df1','df2'])
s.columns=s.columns.droplevel(1)

s=s.swaplevel(0,1,axis=1).sort_index(axis=1)
s
Out[473]: 
         a         b         c     
       df1  df2  df1  df2  df1  df2
lvl_a  1.0  1.0  2.0  2.0  3.0  3.0
lvl_b  4.0  4.0  5.0  5.0  6.0  6.0
lvl_c  7.0  7.0  8.0  8.0  9.0  9.0

Update

pd.concat([s,s.sum().to_frame('total').T])
Out[479]: 
          a           b           c      
        df1   df2   df1   df2   df1   df2
lvl_a   1.0   1.0   2.0   2.0   3.0   3.0
lvl_b   4.0   4.0   5.0   5.0   6.0   6.0
lvl_c   7.0   7.0   8.0   8.0   9.0   9.0
total  12.0  12.0  15.0  15.0  18.0  18.0
Sign up to request clarification or add additional context in comments.

5 Comments

Haha, that was the fastest answer so far. Thank you! Never seen the swaplevel-functionality before, but your answer works perfectly!
@gussilago yw~ :-) happy coding
While I'm at it: Is it possible to also add a "totals"-row (mean of all rows, per column) to the final dataframe :S If you an answer, I can always add it to the original question...
Much appreciated @Wen
@gussilago yw :-) happy coding

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.