python group by, passing in columns to aggregate function params

Question

I'm looking for understanding of how to do aggregates in pandas when I pass in several columns to the aggregate function. I'm used to dplyr in R where this is mega simple...

In my example, 'data' has many columns, including 'TPR', 'FPR', and 'model'. There are many different datasets concatenated together, and I need to run my function at the 'model' grouped level.

grouped_data = data.groupby(['model']) 
grouped_data.aggregate( sklearn.metrics.auc(x='FPR',y='TPR') )

However, this results in an error.

fuglede · Accepted Answer · 2018-06-16 20:49:18Z

4

As you only want to apply a single method, you can use apply instead of aggregate. The argument has to be a Python callable to be applied to each of the groups, so in your case that would look like

data.groupby('model').apply(lambda group: sklearn.metrics.auc(group.FPR, group.TPR))

For example:

y = np.array([1, 1, 2, 2])
pred = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, _ = sklearn.metrics.roc_curve(y, pred, pos_label=2)
df_a = pd.DataFrame({'model': 'a', 'FPR': fpr, 'TPR': tpr})
df_b = pd.DataFrame({'model': 'b', 'FPR': fpr, 'TPR': tpr})
data = df_a.append(df_b)
data.groupby('model').apply(lambda group: sklearn.metrics.auc(group.FPR, group.TPR))

Output:

model
a    0.75
b    0.75
dtype: float64

edited Jun 16, 2018 at 20:49

answered Jun 16, 2018 at 20:34

fuglede

18.3k3 gold badges62 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python group by, passing in columns to aggregate function params

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related