0

I am working with a dataset that looks like the one below (values changed and the real one is a lot larger)

fruit_type, temp, count
apple, 12, 4
apple, 14, 6
pear, 12, 6
pear, 16, 2
grape 12, 5
peach, 9, 33
peach 6, 3

I am trying to utilize a numpy agg function to find the percent of the total count each of the counts are for each temp. I also would like a column to represent the total count. Below is the code that I have been trying.

data3 = data2.groupby('fruit_type')['count'].agg({
    'prob' : lambda count: ((count) / count.sum()),
    'total' : lambda count: count.size
    })

The temp values are discrete. I would like count to be aggregated on a row by row basis where the total count sum is grouped by the fruit type. Please let me know what is wrong with my code.

1 Answer 1

2

The problem is the first lambda count: count/count.sum(). It returns the same shape as the group rather than aggregating it to a scalar.

You might want to do a transform instead of agg

import pandas as pd
import numpy as np

# suppose this is your df
df

Out[83]: 
  fruit_type   temp   count
0      apple     12       4
1      apple     14       6
2       pear     12       6
3       pear     16       2
4      grape     12       5
5      peach      9      33
6      peach      6       3


# prob part
df['prob'] = df.groupby('fruit_type')['count'].transform(lambda count: ((count) / count.sum()))

# total part
df['total_count'] = df.groupby('fruit_type')['count'].transform(lambda count: count.sum())

df

Out[87]: 
  fruit_type  temp  count    prob  total_count
0      apple    12      4  0.4000           10
1      apple    14      6  0.6000           10
2       pear    12      6  0.7500            8
3       pear    16      2  0.2500            8
4      grape    12      5  1.0000            5
5      peach     9     33  0.9167           36
6      peach     6      3  0.0833           36
Sign up to request clarification or add additional context in comments.

3 Comments

How can I make it aggregate the larger scalar?
@user3609179 I've added some code to illustrate how to use transform. Also, is there a particular reason that why you want size in 'total' : lambda count: count.size instead of .sum()
I want the aggregate total by fruit. Does that work in the current snippet you posted?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.