0

Situation: I have a DataFrame with NaN values. I'm going to make a prognosis of sth for the next year, so I guess i don't need very old data. I want to check the 'structure' of NaNs to see if there are lots of them in old data, and not so much in the new ones.

df = pd.DataFrame(columns = ['year','water_consumption','some_index'], 
                  data = [[1, float('nan'),3],[2,26,7], [5,float('nan'),6], 
                         [1,float('nan'),42],[1,float('nan'),13]])
A B C
0 1 NaN 3
1 2 26.0 7
2 5 NaN 6
3 1 NaN 42
4 1 NaN 13

The question is: how can I group number of NaN values in feature_one by values of feature_two easily (I know I can make a list and cycling by every value of feature_one and then count them, but I'd like to know if there is any easier and more elegant way)? Groupby has no 'isna()' method.

In the end i want to see a table like:

A B_nan_count
0 1 3
1 2 0
2 5 1

1 Answer 1

3

First compare B for boolean mask and for count aggregate sum:

df1 = df.B.isna().groupby(df.A).sum().reset_index(name='B_nan_count')

Your solution filter rows, so df[df.B.isna()] return DataFrame, so for count use GroupBy.size:

df1 = df[df.B.isna()].groupby('A').size().reset_index(name='B_nan_count')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.