0

Similar to this question, I have a feature 'preWeight' which has multiple observations for each MotherID, I want to transform this to dataframe to a new datframe where

  • I assign preWeight a value of "Yes" if preWeight>=4000 for a particular MotherID regardless of the remaining observations
  • Otherwise if preWeight is <4000 for a particular MotherID, I will assign preWeight a value of "No"

So I want to transform this dataframe:

    ChildID   MotherID   preWeight
0     20      455        3500
1     20      455        4040
2     13      102        2500
3     13      102        NaN
4     702     946        5000
5     82      571        2000
6     82      571        3500
7     82      571        3800

Into this:

    ChildID   MotherID   preWeight
0   20        455        Yes
1   13        102        No
2   702       946        Yes
3   82        571        No

I have tried this:

df.groupby('MotherID')['preWeight'].apply(
    lambda x: 'Yes' if x>4000 in x.values else 'No').reset_index()

Bu I am getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Thanks in advance.

1
  • What value is preWeight supposed to have if preWeight is once below 4000 and once above 4000 for the same ChildID and MotherID? Commented Jul 20, 2020 at 14:41

1 Answer 1

2

Try this with pandas.DataFrame.any:

df.groupby(['ChildID','MotherID']).agg(lambda x: 'Yes' if (x>4000).any() else 'No').reset_index()

Output:

   ChildID  MotherID preWeight
0       13       102        No
1       20       455       Yes
2       82       571        No
3      702       946       Yes
Sign up to request clarification or add additional context in comments.

4 Comments

I think your answer is missing preWeight, so it should be: df.groupby(['ChildID','MotherID'])['preWeight'].agg(lambda x: 'Yes' if (x>4000).any() else 'No').reset_index()
Also, why did you use the agg function here not apply, what is the difference?
It doesn't matter, because since there were three columns and when I grouped by, the index become the first two columns, so, specifying the column that it's going to be modified in this case, doesn't matter, because it only lefts one column. @sums22
Here is the difference about agg and apply. But in this case, there wasn't an specific reason. Also, if it was helpful, consider accepting the answer, thanks :). @sums22

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.