1

My dataset has a column called age and I'm trying to count the null values.

I know it can be easily achieved by doing something like len(df) - df['age'].count(). However, I'm playing around with functions and just like to apply the function to calculate the null count.

Here is what I have:

def age_is_null(df):
    age_col = df['age']
    null = df[age_col].isnull()
    age_null = df[null]
    return len(age_null)

count = df.apply(age_is_null)
print (count)

When I do that, I received an error: KeyError: 'age'.

Can someone tells me why I'm getting that error and what should I change in the code to make it work?

1
  • 1
    I think you misunderstand how apply works. Check in the docs what kind of functions you can use with it. Commented Oct 7, 2020 at 7:14

3 Answers 3

2

You need DataFrame.pipe or pass DataFrame to function here:

#function should be simplify
def age_is_null(df):
    return df['age'].isnull().sum()


count = df.pipe(age_is_null)
print (count)

count = age_is_null(df)
print (count)

Error means if use DataFrame.apply then iterate by columns, so it failed if want select column age.

def func(x):
   print (x)

df.apply(func)

EDIT: For selecting column use column name:

def age_is_null(df):
    age_col = 'age' <- here
    null = df[age_col].isnull()
    age_null = df[null]
    return len(age_null)

Or pass selected column for mask:

def age_is_null(df):
    age_col = df['age']
    null = age_col.isnull()  <- here
    age_null = df[null]
    return len(age_null)
Sign up to request clarification or add additional context in comments.

3 Comments

Hey, thanks for your help. When I used your code, it works out great. However, I'm more trying to understand why it didn't work for my original code. I left my function as it is and changed the count = df.pipe(age_is_null) and then print (count). It still gives me an error. The difference between the output of your function and my function is you have a type numpy.int32, and my is int. Is that the main reason why it didn't work for me?
I did have try to pass the dataframe which means I have tried count = age_is_null(df) before I posted this question, and that's the error message I got and it is the same error message as I changed to pipe instead of apply. ""None of [Float64Index([22.0, 38.0, 26.0, 35.0, 35.0, nan, 54.0, 2.0, 27.0, 14.0,\n ...\n 33.0, 22.0, 28.0, 25.0, 39.0, 27.0, 19.0, nan, 26.0, 32.0],\n dtype='float64', length=891)] are in the [columns]"
@MkD - there was double selecting, edited answer for possible solutions.
0

Instead of making a function, you can Try this

df[df["age"].isnull() == True].shape

1 Comment

You don't really need the ` == True` part, you can just ues df[df["age"].isnull()].
0

You need to pass dataframe df while calling the function age_is_null.That's why age column is not recognised.

count = df.apply(age_is_null(df))

1 Comment

I didn't check whole code but when I tried to use age_is_null() it give the error- TypeError: age_is_null() missing 1 required positional argument: 'df'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.