1

I'm just getting into Pandas and trying to generate a spreadsheet for a car lot. I'm loving Pandas but it's slow going and I'm trying to generate some new columns that sum ...

import pandas as pd

data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]})

print data

print data.groupby(['Car', 'Type', 'Trans'])['TOTAL'].sum()

I'm getting the totally predictable ....

       Car  TOTAL   Trans    Type
0  Hyundai      2    Auto  Accent
1  Hyundai      4  Manual  Accent
2    Honda      5    Auto   Civic
3    Honda      3  Manual   Civic

Car      Type    Trans 
Honda    Civic   Auto      5
                 Manual    3
Hyundai  Accent  Auto      2
                 Manual    4

Ideally what I'd love to pull off is.....

Car       Type    Auto    Manual  Total
Honda     Civic     5        3      8
Hyundai   Accent    2        4      6

My knowledge isn't that great of Pandas (yet), but I'm guessing it's an "apply" or an agg() function but so far, syntactically, I'm banging my head from the syntax errors, but I appreciate any pointers in the right direction. .. JW

2 Answers 2

3

To use the built-in pandas methods, you could: set your 'Car', 'Type', 'Trans' columns as index and unstack() to get the Total for each subgroup, then just sum over the columns:

data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]}).set_index(['Car', 'Type', 'Trans'])

total_by_trans = data.unstack().loc[:, 'TOTAL']         # to get rid of the column MultiIndex created by unstack()
total_by_trans['Total'] = total_by_trans.sum(axis=1)    
total_by_trans.columns.name = None                      # just cleaning up

                Auto  Manual  Total
Car     Type                       
Honda   Civic      5       3      8
Hyundai Accent     2       4      6
Sign up to request clarification or add additional context in comments.

1 Comment

Oh! Can't wait to try this out!
1

You can prepare two new series ahead of time in the dataframe with auto and manual counts.

data['total_manual'] = data['TOTAL'] * (data['Trans'] == 'Manual').astype(int)
data['total_auto'] = data['TOTAL'] * (data['Trans'] == 'Auto').astype(int)
print data.groupby(['Car', 'Type'])['total_auto', 'total_manual', 'TOTAL'].sum()

Also a similar approach is to use a pivot table with margins.

pvt = pd.pivot_table(data, index=['Car', 'Type'], columns='Trans', values='TOTAL', margins='columns', aggfunc=np.sum)
pvt = pvt.drop(('All',''), axis=0)

1 Comment

Proof positive David that sometimes I just need to adjust the way I think. Marked as the answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.