1

I feel like I am missing something really simple here, can someone tell me what is wrong with this code?

I am trying to group by Sex where the Age > 30 and the Survived value = 1.

'Sex' is a boolean value (1 or 0), if that makes a difference

data_r.groupby('Sex')([data_r.Age >30],[data_r.Survived == 1]).count()

This is throwing: "'DataFrameGroupBy' object is not callable"

any ideas? thanks

2 Answers 2

2

You need filter first and then groupby.

data_r[(data_r.Age>30) & (data_r.Survived==1)].groupby('Sex').count()
Sign up to request clarification or add additional context in comments.

Comments

1

You can do you filtering before grouping.

data_r.query('Age > 30 and Survived == 1').groupby('Sex').count()

Output:

        PassengerId  Survived  Pclass  Name  Age  SibSp  Parch  Ticket  Fare  \
Sex                                                                            
female           83        83      83    83   83     83     83      83    83   
male             41        41      41    41   41     41     41      41    41   

        Cabin  Embarked  
Sex                      
female     47        81  
male       25        41  

IMHO... I'd use size it is safer, count does not include null values(NaN values). Notice those different values in the columns this is due to NaN values.

data_r.query('Age > 30 and Survived == 1').groupby('Sex').size()

Output:

Sex
female    83
male      41
dtype: int64

1 Comment

Ah. thank you Scott! That size() does help with the output. I was getting that grid with redundant numbers as you are showing above it, before i added that. perfect

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.