4

I have the following pandas dataframe;

a = [['01', '12345', 'null'], ['02', '78910', '9870'], ['01', '23456', 'null'],['01', '98765', '8760']]

df_a = pd.DataFrame(a, columns=['id', 'order', 'location'])

I need to get a count of how many NULL values (NULL is a string) that occur for each ID. So the result would look like;

id   null_count
01    02

I can get basic counts using a groupby:

new_df = df_a.groupby(['id', 'location'])['id'].count()

But the results return more than just the NULL values;

id  location
01  8760        1
    null        2
02  9870        1

3 Answers 3

6

Because in your source dataframe your NULLs are strings 'null', use:

df_a.groupby('id')['location'].apply(lambda x: (x=='null').sum())\
    .reset_index(name='null_count')

Output:

   id  null_count
0  01          2
1  02          0

OR

df_a.query('location == "null"').groupby('id')['location'].size()\
    .reset_index(name='null_count')

Output:

   id  null_count
0  01           2
Sign up to request clarification or add additional context in comments.

1 Comment

I am wishing you a Merry Christmas! And thank you for all support, so small present for you (3+). Good luck!
5

Base on your own code , adding .loc notice this is multi index slice ..

df_a.groupby(['id', 'location'])['id'].count().loc[:,'null']
Out[932]: 
id
01    2
Name: id, dtype: int64

Comments

4
In [16]: df_a.set_index('id')['location'].eq('null').sum(level=0)
Out[16]:
id
01    2.0
02    0.0
Name: location, dtype: float64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.