Replace values greater than limit using lambda in multiple observational feature in pandas dataframe

Question

Similar to this question, I have a feature 'preWeight' which has multiple observations for each MotherID, I want to transform this to dataframe to a new datframe where

I assign preWeight a value of "Yes" if preWeight>=4000 for a particular MotherID regardless of the remaining observations
Otherwise if preWeight is <4000 for a particular MotherID, I will assign preWeight a value of "No"

So I want to transform this dataframe:

    ChildID   MotherID   preWeight
0     20      455        3500
1     20      455        4040
2     13      102        2500
3     13      102        NaN
4     702     946        5000
5     82      571        2000
6     82      571        3500
7     82      571        3800

Into this:

    ChildID   MotherID   preWeight
0   20        455        Yes
1   13        102        No
2   702       946        Yes
3   82        571        No

I have tried this:

df.groupby('MotherID')['preWeight'].apply(
    lambda x: 'Yes' if x>4000 in x.values else 'No').reset_index()

Bu I am getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Thanks in advance.

What value is preWeight supposed to have if preWeight is once below 4000 and once above 4000 for the same ChildID and MotherID? — drops
– drops, Commented Jul 20, 2020 at 14:41

MrNobody33 · Accepted Answer · 2020-07-20 14:51:56Z

2

Try this with pandas.DataFrame.any:

df.groupby(['ChildID','MotherID']).agg(lambda x: 'Yes' if (x>4000).any() else 'No').reset_index()

Output:

   ChildID  MotherID preWeight
0       13       102        No
1       20       455       Yes
2       82       571        No
3      702       946       Yes

answered Jul 20, 2020 at 14:51

MrNobody33

6,5039 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

sums22 Over a year ago

I think your answer is missing preWeight, so it should be: df.groupby(['ChildID','MotherID'])['preWeight'].agg(lambda x: 'Yes' if (x>4000).any() else 'No').reset_index()

sums22 Over a year ago

Also, why did you use the agg function here not apply, what is the difference?

MrNobody33 Over a year ago

It doesn't matter, because since there were three columns and when I grouped by, the index become the first two columns, so, specifying the column that it's going to be modified in this case, doesn't matter, because it only lefts one column. @sums22

MrNobody33 Over a year ago

Here is the difference about agg and apply. But in this case, there wasn't an specific reason. Also, if it was helpful, consider accepting the answer, thanks :). @sums22

Collectives™ on Stack Overflow

Replace values greater than limit using lambda in multiple observational feature in pandas dataframe

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related