0

Let's say I have this toy dataset

import pandas as pd

df = pd.DataFrame({
    'animal': ['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'],
    'num': range(6)
})

and I create two simple custom functions (one for the string column animal, one for the numeric column num) that I will later use in an apply function. Such functions are

def fn_num(x):
    if x['num'] >= 5:
        return 1
    elif x['num'] <= 1:
        return 0
    else:
        return -1

def fn_animal(x):
    if x['animal'].isin(['cow', 'hippo']):
        return 1
    elif x['animal'].str.contains('ee'):
        return 0
    else:
        return -1

where the argument x should be a pandas DataFrame such as the object df.

I later use them in an apply function (I know that this is not the most optimized code in terms of efficiency, but I prefer to leave it in this way for the sake of clarity)

df.apply(fn_num, axis=1)

0    0
1    0
2   -1
3   -1
4   -1
5    1
dtype: int64



df.apply(fn_animal, axis=1)

AttributeError: ("'str' object has no attribute 'isin'", 'occurred at index 0')

The function fn_num applied to the numeric column works fine, whereas the function fn_animal applied to the string column gives back an error. However, if I write the code outside the custom function, I get no errors with the attribute isin:

df['animal'].isin(['cow', 'hippo'])

0    False
1     True
2    False
3    False
4    False
5     True
Name: animal, dtype: bool



df['animal'].str.contains('ee')

0    False
1    False
2    False
3     True
4    False
5    False
Name: animal, dtype: bool

My desired output would be:

df.apply(fn_animal, axis=1)

0   -1
1    1
2   -1
3    0
4   -1
5    1
dtype: int64

I spent quite some type on this issue and I'm sure I'm missing something very silly but I couldn't figure it out. What can I do to make the function fn_animal work inside the apply?

3
  • at a quick glance - since accessing by row using x['animal'] gives you a native str object... you probably want to change it to: if x['animal'] in ['cow', 'hippo'] ? Commented Apr 9, 2020 at 9:47
  • 1
    On a side note and not an answer to how to make it work with .apply (but probably a preferred way of doing so) is to use: df['animal'].replace([r'^(hippo|cow)$', 'ee', '.*'], [1, 0, -1], regex=True) Commented Apr 9, 2020 at 9:53
  • 1
    @JonClements Thank you for the side note, it will be important in some other occasions for sure :) Commented Apr 9, 2020 at 10:00

3 Answers 3

1

The error says it all, you are applying pandas functions on string objects not Dataframe so just use standard in operator to check for a string or substring.

Updated code:

import pandas as pd

df = pd.DataFrame({
    'animal': ['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'],
    'num': range(6)
})

def fn_num(x):
    if x['num'] >= 5:
        return 1
    elif x['num'] <= 1:
        return 0
    else:
        return -1

def fn_animal(x):
    if x['animal'] in (['cow', 'hippo']):
        return 1
    elif 'ee' in x['animal']:
        return 0
    else:
        return -1

print(df.apply(fn_num, axis=1))




print(df.apply(fn_animal, axis=1))

Out:

0    0
1    0
2   -1
3   -1
4   -1
5    1
dtype: int64
0   -1
1    1
2   -1
3    0
4   -1
5    1
dtype: int64

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the quick response and the explanation on the error, I'm not very familiar with the error messages!
I think Serge's answer says clearly about the error, I also mentioned. You are using a function which needs pandas dataframe not a str, that's why.
1

The problem is that in the apply function, x is a Series and no longer a DataFrame. Because of that, x[y] is a scalar value, either a numeric (and fn_num works fine is x['num'] is a number) or a plain string.

So in fn_animal, x['animal'] is a plain string and it has no isin method: the error is normal.

1 Comment

I understood what is the error now, thank you for the clear explanation!
1

The objects passed to the function are series objects according the index parameter. So x['animal'] is a str

Code modification tp fn_animal():

def fn_animal(x):
    if x['animal'] in ['cow', 'hippo']:
        return 1
    elif 'ee' in x['animal']:
        return 0
    else:
        return -1

Quoting the documentation

DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)[source]¶ Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

1 Comment

Thank you for the quick response!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.