1

I am currently looking to aggregate a dataframe by many categorical columns and sum up several metric columns as well. I am trying to do this similiar to how I would in SQL but I cant seem to find a simple method. I also am not sure if I am at the limits of pandas group by as the code below returns a keyerror on the second metric column. the code will run if I only aggregate one column. How do I aggregate multiple columns?

df_agg = pd.DataFrame(data = df.groupby(['House', 'cat1', 'cat2', 'cat3'])
['points'].mean()
['counts'].count()
['value'].sum()
['metric'].sum()
['metric2'].sum()
['metric3'].sum())  
1
  • If get keyerror it means some column is missing, or whitespace in column name. the best check it by print (df.columns.tolist()) Commented May 9, 2018 at 13:10

1 Answer 1

2

Use agg by dictionary of columns with aggregate functions, DataFrame contructor is not necessary:

d = {'points':'mean', 'counts':'count','value':'sum','metric':'sum','metric1':'sum','metric2':'sum'}
df_agg = df.groupby(['House', 'cat1', 'cat2', 'cat3']).agg(d).reset_index()
print (df_agg)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.