Pandas apply function, receiving KeyError 'Column Name'

Question

My dataset has a column called age and I'm trying to count the null values.

I know it can be easily achieved by doing something like len(df) - df['age'].count(). However, I'm playing around with functions and just like to apply the function to calculate the null count.

Here is what I have:

def age_is_null(df):
    age_col = df['age']
    null = df[age_col].isnull()
    age_null = df[null]
    return len(age_null)

count = df.apply(age_is_null)
print (count)

When I do that, I received an error: KeyError: 'age'.

Can someone tells me why I'm getting that error and what should I change in the code to make it work?

I think you misunderstand how apply works. Check in the docs what kind of functions you can use with it. — Timus
– Timus, Commented Oct 7, 2020 at 7:14

jezrael · Accepted Answer · 2020-10-08 05:49:35Z

2

You need DataFrame.pipe or pass DataFrame to function here:

#function should be simplify
def age_is_null(df):
    return df['age'].isnull().sum()


count = df.pipe(age_is_null)
print (count)

count = age_is_null(df)
print (count)

Error means if use DataFrame.apply then iterate by columns, so it failed if want select column age.

def func(x):
   print (x)

df.apply(func)

EDIT: For selecting column use column name:

def age_is_null(df):
    age_col = 'age' <- here
    null = df[age_col].isnull()
    age_null = df[null]
    return len(age_null)

Or pass selected column for mask:

def age_is_null(df):
    age_col = df['age']
    null = age_col.isnull()  <- here
    age_null = df[null]
    return len(age_null)

edited Oct 8, 2020 at 5:49

answered Oct 7, 2020 at 7:09

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mk D Over a year ago

Hey, thanks for your help. When I used your code, it works out great. However, I'm more trying to understand why it didn't work for my original code. I left my function as it is and changed the count = df.pipe(age_is_null) and then print (count). It still gives me an error. The difference between the output of your function and my function is you have a type numpy.int32, and my is int. Is that the main reason why it didn't work for me?

Mk D Over a year ago

I did have try to pass the dataframe which means I have tried count = age_is_null(df) before I posted this question, and that's the error message I got and it is the same error message as I changed to pipe instead of apply. ""None of [Float64Index([22.0, 38.0, 26.0, 35.0, 35.0, nan, 54.0, 2.0, 27.0, 14.0,\n ...\n 33.0, 22.0, 28.0, 25.0, 39.0, 27.0, 19.0, nan, 26.0, 32.0],\n dtype='float64', length=891)] are in the [columns]"

jezrael Over a year ago

@MkD - there was double selecting, edited answer for possible solutions.

Nirali Khoda · Accepted Answer · 2020-10-07 07:14:44Z

0

Instead of making a function, you can Try this

df[df["age"].isnull() == True].shape

answered Oct 7, 2020 at 7:14

Nirali Khoda

1,7011 gold badge10 silver badges27 bronze badges

1 Comment

Shaido Over a year ago

You don't really need the ` == True` part, you can just ues df[df["age"].isnull()].

Shubhangi Chaturvedi · Accepted Answer · 2020-10-07 08:38:35Z

0

You need to pass dataframe df while calling the function age_is_null.That's why age column is not recognised.

count = df.apply(age_is_null(df))

answered Oct 7, 2020 at 8:38

Shubhangi Chaturvedi

1577 bronze badges

1 Comment

Shubhangi Chaturvedi Over a year ago

I didn't check whole code but when I tried to use age_is_null() it give the error- TypeError: age_is_null() missing 1 required positional argument: 'df'

Collectives™ on Stack Overflow

Pandas apply function, receiving KeyError 'Column Name'

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related