Conditional Summing on python dataframe

Question

I'm just getting into Pandas and trying to generate a spreadsheet for a car lot. I'm loving Pandas but it's slow going and I'm trying to generate some new columns that sum ...

import pandas as pd

data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]})

print data

print data.groupby(['Car', 'Type', 'Trans'])['TOTAL'].sum()

I'm getting the totally predictable ....

       Car  TOTAL   Trans    Type
0  Hyundai      2    Auto  Accent
1  Hyundai      4  Manual  Accent
2    Honda      5    Auto   Civic
3    Honda      3  Manual   Civic

Car      Type    Trans 
Honda    Civic   Auto      5
                 Manual    3
Hyundai  Accent  Auto      2
                 Manual    4

Ideally what I'd love to pull off is.....

Car       Type    Auto    Manual  Total
Honda     Civic     5        3      8
Hyundai   Accent    2        4      6

My knowledge isn't that great of Pandas (yet), but I'm guessing it's an "apply" or an agg() function but so far, syntactically, I'm banging my head from the syntax errors, but I appreciate any pointers in the right direction. .. JW

Stefan · Accepted Answer · 2015-12-28 18:01:11Z

3

To use the built-in pandas methods, you could: set your 'Car', 'Type', 'Trans' columns as index and unstack() to get the Total for each subgroup, then just sum over the columns:

data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]}).set_index(['Car', 'Type', 'Trans'])

total_by_trans = data.unstack().loc[:, 'TOTAL']         # to get rid of the column MultiIndex created by unstack()
total_by_trans['Total'] = total_by_trans.sum(axis=1)    
total_by_trans.columns.name = None                      # just cleaning up

                Auto  Manual  Total
Car     Type                       
Honda   Civic      5       3      8
Hyundai Accent     2       4      6

edited Dec 28, 2015 at 18:01

answered Dec 28, 2015 at 17:54

Stefan

43.1k13 gold badges80 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jane Wilkie Over a year ago

Oh! Can't wait to try this out!

David Maust · Accepted Answer · 2015-12-28 19:35:59Z

1

You can prepare two new series ahead of time in the dataframe with auto and manual counts.

data['total_manual'] = data['TOTAL'] * (data['Trans'] == 'Manual').astype(int)
data['total_auto'] = data['TOTAL'] * (data['Trans'] == 'Auto').astype(int)
print data.groupby(['Car', 'Type'])['total_auto', 'total_manual', 'TOTAL'].sum()

Also a similar approach is to use a pivot table with margins.

pvt = pd.pivot_table(data, index=['Car', 'Type'], columns='Trans', values='TOTAL', margins='columns', aggfunc=np.sum)
pvt = pvt.drop(('All',''), axis=0)

edited Dec 28, 2015 at 19:35

answered Dec 28, 2015 at 17:47

David Maust

8,3003 gold badges34 silver badges36 bronze badges

1 Comment

Jane Wilkie Over a year ago

Proof positive David that sometimes I just need to adjust the way I think. Marked as the answer!

Collectives™ on Stack Overflow

Conditional Summing on python dataframe

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related