2

I'm looking for understanding of how to do aggregates in pandas when I pass in several columns to the aggregate function. I'm used to dplyr in R where this is mega simple...

In my example, 'data' has many columns, including 'TPR', 'FPR', and 'model'. There are many different datasets concatenated together, and I need to run my function at the 'model' grouped level.

grouped_data = data.groupby(['model']) 
grouped_data.aggregate( sklearn.metrics.auc(x='FPR',y='TPR') )

However, this results in an error.

1 Answer 1

4

As you only want to apply a single method, you can use apply instead of aggregate. The argument has to be a Python callable to be applied to each of the groups, so in your case that would look like

data.groupby('model').apply(lambda group: sklearn.metrics.auc(group.FPR, group.TPR))

For example:

y = np.array([1, 1, 2, 2])
pred = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, _ = sklearn.metrics.roc_curve(y, pred, pos_label=2)
df_a = pd.DataFrame({'model': 'a', 'FPR': fpr, 'TPR': tpr})
df_b = pd.DataFrame({'model': 'b', 'FPR': fpr, 'TPR': tpr})
data = df_a.append(df_b)
data.groupby('model').apply(lambda group: sklearn.metrics.auc(group.FPR, group.TPR))

Output:

model
a    0.75
b    0.75
dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.